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orf75-l ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
190 200 210 220 230 240 

250 260 270 280 290 

orfVSa.pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

orf75-l EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
250 260 270 280 290 



Homology with a predicted ORF from N.sonorrhoeae 

ORF75 shows 93.2% identity over a 292aa overlap with a predicted ORF (ORF75.ng) from A^. 
gonorrhoeae: 

MFVFQTAFXMFQKHLQKASDSVVGGTLYVVATPIGNLADITLRALAVLQKA AEDTR 56 

MSVFQTAFFMFQKHLQKASDSVVGGTLYVVATPIGNLADITLRALAVL^ 60 

VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMVVAQVSDAGTPAVCDPGAKLAR 116 

VTAQLLSAYGIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLAR 120 

RVREAGFKVVPWGAXAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 176 

RVREAGFKVVPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPVV 180 



orf75.pep 
orf75ng 
orf75.pep 
orf75ng 
orf 75 .pep 
orf75ng 
orf 75 .pep 
orf 75ng 
orf75.pep 
orf 75ng 



MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 236 



111:1 



I I I 



I I I 



I I I 



I I I 



I I I I I I 



Mill: 



MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 

VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 

VLVLYPAQDEKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 



An ORF75ng nucleotide sequence <SEQ ID 291> was predicted to encode a protein having amino 
acid sequence <SEQ ID 292>: 

1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQ^4ADKV IGFLSDGLW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFK VV PVVGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPVV MFETPHRIGA TLADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 



After fiirther analysis, the following gonococcal DNA sequence <SEQ ID 293> was identified: 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACTG 
CAGTGTGCGC 
TCCTTTCAGA 
GCCGTGTGCG 
GTTCAAAGTC 
GTGTGGCCGG 
CCGAAATCGG 
ATTTCCTGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAATGCGA 
GGAGCTTGCC 
TGGCACTGTC 



AACACTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CGCAGCTTTT 
GAACACAAGG 
CGGCCTGGTT 
ACCCGGGCGC 
GTTCCCGTCG 
TGTGGCGGAA 
GCGAACGTAG 
GTCATGTTTG 
GGAATTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAGATTA 
GTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AGCGGCAGAT 
GTGGCGCAGG 
GAAACTCGCC 
TGGGCGCAAG 
TCCGATTTTT 
GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
TGCGGCCGAG 
CAGGTGAGGG 
AAATGA 



GACAGCGTCG 
TTTGGCAGAC 
TCATTTGTGC 
GGCATTCAGG 
GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTAATG 
ATTTCAACGG 
GCCAAATGGG 
CCGAATCGGG 
GTCTGATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
CAAAAAGGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAGGTTGGT 
GTAATCGGTT 
GGGTACGCCG 
GCGAAGCAGG 
GCGGCGTTGA 
TTTTGTACCG 
TGCGGGCGGC 
GCAACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCTGCG 
AGCAGGCGGC 
TTGTACGATT 



This corresponds to the amino acid sequence <SEQ ID 294; ORF75ng-l>: 
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1 MFQKHLQBCftS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75ng-l and ORF75-1 show 96.2% identity in 291 aa overlap: 



. . pep MFQKHLQKASDSVVGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 
[-1 MFQKHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 



orf75-l.pep 
orf75ng-l 



GIQGKLVSVREHNERQMADKIVGYLSDGMVVAQVSDAGTPAVCDPGAKLARRVREAGFKV 
GIQGRLVSVREHNERQMADKVIGFLSDGLVVAQVSDAGTPAVCDPGAKLARRVREAGFKV 



orf 75-1 . pep VPVVGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 
orf75ng-l VPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIG 



>rf 75-1 . pep ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
>rf75ng-l ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 



orf 75-1 . pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
orf75ng-l EKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 



Furthermore, ORG75ng-l shows significant homology to a hypothetical E.coli protein: 

spl P45528 I YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi I 606085 (018997) 0RF_f286 [Escherichia coli] 

>gi 1 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic 
region [Escherichia coli] Length = 286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

KHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 
K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ— GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 5 9 

GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 



■ +L AELP K+AA LAA+I G K ALY AL 







Sbjct: 


2 


Query: 


64 


Sbjct: 


60 




124 


Sbjct: 


120 






Sbjct: 






243 


Sbjct: 


239 
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Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 35 

The following partial DNA sequence was identified in N. meningitidis <SEQ E) 295>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GC.AAAGCAC CCGAAATCGA CCCGGCTTTG 

// 

651 GAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 
751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 296; ORF76>: 

1 MKQKKTAAAV lAAMLAGFAA XKAPEIDPAL 

// 

201 ELVRNQLEQG LRQEKARLKI DALLEENGVK 

251 P* 

Further work revealed the complete nucleotide sequence <SEQ ID 297>: 



1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTACAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAGACGAGCT 

351 GCACAAGTTT TACGAACAGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCCGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 298; ORF76-l>: 



1 MKQKKTAAAV lAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSEDELHKF YEQQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORE from N.meninsitidis (strain A) 

ORF76 shows 96.7% identity over a 30aa overlap and 96.8% identity over a 3 laa overlap with an 
ORE (ORF76a) from strain A ofN. meningitidis: 

10 20 30 

orf7 6.pep MKQKKTAAAV I AAMLAG FAAXKA PE I D PAL 

or f 7 6a MKQKKTAAAVIAAMLAGFAAAKA PEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRMD 
10 20 30 40 50 60 

// 

70 80 90 
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ir f 7 6 . pep XELVRNQLEQGLRQEKARLKI DALLEENGVKPX 

,rf7 6a DVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLKIDAILEENGVKPX 



The complete length ORF76a nucleotide sequence <SEQ LD 299> is: 



ATGAAACAGA 
TTTTGCGGCA 
TGGTGGCGCA 
AAACCGGACG 
TTTGGAAGTT 
AGGATGTCCA 
GAGTACGTCC 
GCGTCAGTTT 
GCTTCGCAAC 
GGGCTGTCTT 
TTTTGACGGT 
AGTTTGCAGC 
GGCGAACGCT 
CGCGCAGCCT 
AGGAAAAAGC 
AAACCGTAA 



AAAAAACCGC 
GCCAAAGCAC 
GATCATGCAG 
GGCAGGCAAT 
TTGAAAAACA 



GTTTTCTGGA 
TATGAGCGGC 
CGAAGAGGAG 
TTGAAGGGCT 
TTCATTATGG 
GATGAATCGG 
ATTATCTGTT 
TTCGAGTTGG 
CCGCTTGAAA 



TGCCGCAGTT 
CCGAAATCGA 
CAGGCAGACC 
CCGAAACGAT 
GGGCATTGAA 
AAAATCGCCG 
ACGTTCGGAA 
AAATCCGCAT 
GCGCGTCAGG 
GATGAAGCGT 
CGCAGCAGCT 
GGCGACGTTA 
CAAACTCAGC 
TCAGAAACCA 
ATCGATGCCA 



ATTGCTGCAA 
CCCGGCTTTG 
GGCATGCGGA 
GCCGTCCGTC 
GGAAGGTTTG 
AAGCGTCTTT 
ACGGTTTCCG 
GATCAAATTG 
CGCAGCAGCT 
TATCCGAACG 
TCCCGAGCCG 
CCCGCGATCC 
GAGGTCGGGA 
GTTGGAACAA 
TTTTGGAAGA 



TGTTGGCAGG 
GTGGATACGC 
GCAGTCCCAA 
GGCTGCAAAC 
GATAAGGATA 
TTATGCCGAG 
AAAGCGCACT 
CAGCAGGTCA 
CCTGCTCAAA 
ACGAGCAGGC 
CTGGCTTCGC 
GGTCAAATTG 
AAAACCCCGA 
GGTTTGAGAC 
AAACGGTGTC 



This encodes a protein having amino acid sequence <SEQ ID 300>: 

1 MKQKKTAAAV lAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDAILEENGV 

251 KP* 

ORF76a and ORF76-1 show 97.6% identity in 252 aa overlap: 



MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 



orf 7 6a . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 
I I ! I I I I I I I ! I i I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I : : I 
orf 7 6-1 AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 



orf 7 6a. pep 
orf76-l 



YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 



)rf7 6-l 



LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LA3QFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 



irf76a .pep 
irf76-l 



IDAILEENGVKPX 
IDALLEENGVKPX 



Homologv vyith a predicted ORF from N. gonorrhoeae 

The aligned aa sequences of ORF76 and a predicted ORF (ORF76.ng) from N. gonorrhoeae of the 
N- and C-termini show 96.7 % and 100% identity in 30 and 31 overlap, respectively: 
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or f 7 6 . pep MKQKKTAAAVIAAMLAGFAAXKAPEIDPAL 

orf 7 6ng MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 
// 

or f 7 6 . peo ELVRNQLEQGLRQEKARLKIDALLEENGVKP 
I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I 
orf7 6ng VTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLKIDALLEENGVKP 

The complete length ORF76ng nucleotide sequence <SEQ ID 301> is: 



51 



ATGAAACAGA 
TTTTGCGGCA 
TGGTGGCGCA 
AGACCGGACG 
TTTGGAAGTT 
AGGATGTCCA 
GAGTACGTCC 
GCGTCAGTTT 
GCTTCGCAAC 
GGGCTGTCTT 
GTTCGACGGT 
agtttgCCGG 
GGCGAACGCT 
CGCGCAGCCT 
AGGAAAAAGC 
AaacCGTAA 



AAAAGACCGC 
GCCAAAGCAC 
GATCATGCAG 
GGCAGGCAAT 
TTGAAAAACA 
AAACCGCTTT 
GTTTTCTGGA 
TATGAGCGGC 
CGAAGAGGAG 
TTGAAGGGCT 
TTCATTATGG 
TATGAACCGT 
ATTACCTGTT 
TTCGAGTTGG 
CCGCTTGAAA 



TGCCGCAGTT 
CCGAAATCGA 
CAGGCAGACC 
CCGAAACGAT 
GGGCATTGAA 
AAAATCGCCG 
ACGTTCGGAA 
AAATCCGCAT 
GCGCGTCAGG 
GATGAAGCGT 
CGCAGCAGCT 
GGCGACGTTA 
CAAACTCGGC 
TCAGAAACCA 
ATCGATGCCC 



ATTGCTGCAA 
CCCGGCTTTG 
GGCATGCGGA 
GCCGTCCGCC 
GGAAGGTTTG 
AAGCGTCTTT 
ACGGTTTCCG 
GATCAAATTG 
CGCAGCAGCT 
TATCCGAACG 
TCCCGAGCCG 
CCCGCAATCC 
GCGGTCGGGA 
GTTGGAACAA 
TTTTGGAaga 



TGTTGGCAGG 
GTGGATACGC 
GCAGTCCCAA 
GGCTGCAAAC 
GATAAGGATA 
TTATGCCGAG 
AAAGCGCACT 
CAGCAGGTCA 
CCTGCTCAAA 
ACGAGCAGGC 
CTGGCTTcgc 
GGTCAAATTG 
AAAACCCCGA 
GGTTTGAGGC 
Aaacggtgtc 



This encodes a protein having amino acid sequence <SEQ ID 302>: 



1 MKOKKTAAAV lAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 RPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAGMNR GDVTRNPVKL 

201 GERYYLFKLG AVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

ORF76ng and ORF76-1 show 96.0% identity in 252 aa overlap 



)rf7 6-l.pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
)rf 7 6ng MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 



)rf7 6-l.pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 
)rf7 6ng AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 



. . pep YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
I YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 



irf 7 6-1 . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
irf7 6ng LASQFAGMNRGDVTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLK 



irf76-l.pep IDALLEENGVKPX 
I I I I I I I I I I I I I 
irf7 6ng IDALLEENGVKPX 



Furthermore, ORF76ng shows significant homology to a B.subtilis export protein precursor: 
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spl P24327 I PR3A_BACS0 PROTEIN EXPORT PROTEIN PRSA PRECURSOR >gi | 9B227 I pir I I S152 6 9 
33K lipoprotein - Bacillus subtilis >gi|39782 (X57271) 33kDa lipoprotein 
[Bacillus subtilis] 

>gi I 2226124 Ignl I PID|e325181 (Y14077) 33kDa lipoprotein [Bacillus subtilis] 
>gi|2633331|gnl|PID|ell82997 (Z99109) molecular chaperonin [Bacillus subtilis] 
Length =292 
Score = 50.4 bits (118), Expect = le-05 

Identities = 48/199 (24%), Positives = 82/199 (41%), Gaps = 32/199 (16%) 

Query: 70 VLKNRALKEGLDK DKDVQNRFKIAEASF YAEEYVRFLERSETVSE 114 

VL ++ LDK DK++ N+ K + Y ++Y++ + E +++ 

Sbjct: 53 VLTQLVQEKVLDKKYKVSDKEIDNKLKEYKTQLGDQYTALEKQYGKDYLKEQVKYELLTQ 112 

Query: 115 SA LRQFYERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPN 163 

A +++++E 1+ + A ++ A + ++ L KG FE L K Y 

Sbjct: 113 KAAKDNIKVTDADIKEYWEGLKGKIRASHILVADKKTAEEVEKKLKKGEKFEDLAKEYST 172 

Query: 164 DEQAFDG FIMAQQLPEPLASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDA 218 

DAG F Q+E+ + G+V+ DPVK Y++ K +E D 

Sbjct: 173 DSSASKGGDLGWFAKEGQMDETFSKAAFKLKTGEVS-DPVKTQYGYHIIKKTEERGKYDD 231 

Query: 219 QPFELVRNQLEQGLRQEKA 237 

EL LEQ L A 
Sbjct: 232 MKKELKSEVLEQKLNDNAA 250 



Based on this analysis, including the presence of a putative leader sequence and a RGD motif in 
the gonococcal protein, it was predicted that the proteins from N.meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF76-1 (27.8kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure lOA shows 
the results of affinity purification of the His-fusion protein, Purified His-fusion protein was used 
to immunise mice, whose sera were used for Western blot (Figure lOB), ELISA (positive result), 
and FACS analysis (Figure IOC). These experiments confirm that ORF76-1 is a surface-exposed 
protein, and that it is a useful immunogen. 



Example 36 



The following partial DNA sequence was identified in N.meningitidis <SEQ ED 303>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTTACCCC TTGGAATTGG GGATTGAAAC CTTACCGGCG 

101 GCAAAAATTG CGGAAACGTT TGCGCTGACA TTTGTGATTG CTGCGCTGTA 

151 TCTGTTTGCG CGTAATAAGG TGACGCGTTT GTTGATTGCG GTGTTTTTTG 

201 CGTTCAGCAT TATTGCCAAC AATGTGCATT ACGCGGATTA TCAAAGCTGG 

251 ATGACG 

// 

1201 CAAACCGTAT TCGAGCAGCT GCAAAAGACT CCTGACGGCA 

1251 ACTGGCTGTT TGCCTATACC TCCGATCATG GCCAGTATGT TCGCCAAGAT 

1301 ATCTACAATC AAGGCACGGT GCAGCCCGAC AGCTATCTCG TGCCGCTAGT 

1351 GTTGTACAGC CCGGATAAGG CCGTGCAACA GGCTGCCAAC CAGGCTTTTG 

1401 CGCCTTGCGA GATTGCCTTC CATCAGCAGC TTTCAACGTT CCTGATTCAC 

14 51 ACGTTGGGCT ACGATATGCC GGTTTCAGGT TGTCGCGAAG GCTCGGTAAC 

1501 GGGCAACCTG ATTACGGGTG ATGCAGGCAG CTTGAACATT CGCGACGGCA 

1551 AGGCGGAATA TGTTTATCCG CAATGA 

This corresponds to the amino acid sequence <SEQ ID 304; 0RF81>: 

1 MKKSFLTLVL YSSLLTASEI AYPLELGIET LPAAKIAETF ALTFVIAALY 
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51 LFARNKVTRL LIAVFFAFSI lANNVHYADY QSWMT 

// 

401 ...QTVFEQL QKTPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

4 51 LYSPDKAVQQ AANQAFAPCE lAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRDGE(AEY VYPQ'' 

Further work revealed the complete nucleotide sequence <SEQ ED 305>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTATCGCT TTGTATTTGG GATTGAAACC TTACCGGCGG 

101 CAAAAATTGC GGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGT GACGCGTTTG TTGATTGCGG TGTTTTTTGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGCAT CAATTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGTG CGTCGATGTT GGATAAGTTG TGGCTGCCTG TGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

451 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAGGATTCC CGCCTTTAAG 

601 CAGCCTGCTC CAAGCAAAAT CGGGCAGGGC AGTGTTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAGCTG TTTGGCTACG 

701 GACGCGAAA.C TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACTG CAGTGTCCCT 

801 GCCCAGTTTT TTCAATGCGA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGCGCGCA GGCGGAAAAC GAGATGGCGA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAT^TCAATT TGCAGCAGGG CAAGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTAGTG 

1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1401 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1451 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1501 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

1551 GGCGGAATAT GTTTATCCGC AATGA 

This corresponds to the amino acid sequence <SEQ ID 306; 0RF81-1>: 

1 MKKSFLTLVL YSSLLTASEI AYRFVFGIET LPAAKIAETF ALTFVIAALY 

51 LFARYKVTRL LIAVFFAFSI lANNVHYAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL WLPVLWGVLE VMLFCSLAKF RRKTHFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSRIPAFK 

201 QPAPSKIGQG SVQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNAIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN EMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGKHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

451 LYSPDKAVQQ AANQAFAPCE lAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORf from N.tnenimitidis ("strain A) 

0RF81 shows 84.7% identity over a 85aa overlap and 99.2% identity over a 121aa overlap with 
an ORF (ORFSla) from sfrain A of A^. meningitidis: 

10 20 30 40 50 60 

orf 81 .pep MKKSFLTLVLYSSLLTAS EIAYPLELGIETLPAAK IAETFALTFVIAALYLF ARNKVTRL 

or f 8 la MKKSLFVLFLYSSLLTAS EIAYRFVFGIETLPAAK MAETFALTFVIAALYLF RRYKATRL 
10 20 30 40 50 50 
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LIAVFFAFSIIANNVH YADYQSWMT 

LIAVFFAFSIIANNVH YAVYQSWITGINYWLMLKEITEVGGAGASMLDKLW LPALWGVLE 



QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 
IPHANGLEQISGGDIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 



lYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
lYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 



or f 8 1 . pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

orfSla CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
400 410 420 

The complete length 0RF81 a nucleotide sequence <SEQ ID 307> is: 

1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCGTCCC TACTTACTGC 

51 CAGCGAAATT GCTTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC AGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC AACGCGTTTG TTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TAACGGGCAT TAATTATTGG CTGATGCTGA AAGAGATTAC CGAAGTTGGC 

301 GGCGCAGGGG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

4 01 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

4 51 GTGCGTTCGT TCGACACGAA ACAAGAACAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATTCC TGTGTTCAAA 

601 CAGCCTGCTC CAAGCAGAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGCTACG 

701 GGCGCGAAAC TTCGCCGTTT TTGACCCAGC TTTCGCAAGC CGATTTTAAG 

7 51 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 

801 GCCCAGTTTC TTTAACGTCA TACCGCATGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

901 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

951 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

1001 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTGGTG 

1051 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1101 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1151 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1201 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

1251 GGCGGAATAT GTTTATCCGC AATGA 

This encodes a protein having amino acid sequence <SEQ ID 308>: 

1 MKKSLFVLFL YSSLLTAS EI AYRFVFGIET LPAAK MAETF ALTFVIAALY 

51 LFARYKATR L LIAVFFAFSI lANNVH YAVY QSWITGINYW LMLKEITEVG 

101 GAGASMLDKL W LPALWGVLE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSRIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTQLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDIVD KYDNTIHKTD 

301 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

351 LYSPDKAVQQ AANQAFAPCE lAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

401 GNLITGDAGS LNIRDGKAEY VYPQ* 

ORFSla and 0RF81-1 show 77.9% identity in 524 aa overlap: 



MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 
I I I I : : : I I I I I I I I I I I I I I I I I I I I I I I I i I I : I I I I I I I I i I I I I I I I I I I I : I I I 
MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 
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irf 81a . pep LIAVFFAFSIIANNVHYAVYQSWITGINYWLMLKEITEVGGAGASMLDKLWLPALWGVLE 
,rf81-l LIAVFFAFSIIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 



orfSla.pep VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf81-l VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 



FVGRVLPYQLFDLSKIPVFKQPAPSRIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 
FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 



■rf 81a. pep LTQL3QADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGD 

.rf81-l LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 



orf81-l TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 
310 320 330 340 350 360 

290 300 310 320 

orfSla.pep IVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

orf81-l IVLHQRGSHAPYGALLQPQDECVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
370 380 390 400 410 420 

330 340 350 360 370 380 

orf 81a . pep AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

orf81-l AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
430 440 450 460 470 480 

390 400 410 420 

orf 81a. pep LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

orf 81-1 LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
490 500 510 520 

Homology with a predicted ORF from N. gonorrhoeae 

The ahgned aa sequences of 0RF81 and a predicted ORF (ORFSl.ng) from A'', gonorrhoeae of the 
N- and C-termini show 82.4 % and 97.5% identity in 85 and 121 overlap, respectively: 

orf 81 . pep MKKSFLTLVLYSSLLTASEIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 60 

I I I I : : : I I I I I I I I I I I I I I : : I I I I I I I I I : I I I I I I I I : I I I I I ! I I ! I : : I I 
orfSlng MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 60 

orf 81. pep LIAVFFAFSIIANNVHYADYQSWMT 85 

orfSlng LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 120 

// 

orf 81. pep QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 433 

orf81ng ALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 433 

orf 81 .pep lYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 4 93 

I I I I I I I I I I I I : I 1 I I I I I I I ! I I I I 1 I I I I I i I I I I I I I I I I I I I I I I I I I I I I ! I i 
orf81ng lYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 4 93 
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orfSl.pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQ 524 

I M I I M I I I I I M I I I I I I I : I I I I I I I I I 

orfSlng CREGSVTGNLITGDAGSLNIRNGKAEYVYPQ 524 

The complete length ORFSlng nucleotide sequence <SEQ ID 309> is: 

1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCATCCC TACTTACCGC 

51 CAGCGAAATC GCCTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC GGAAACGTTT GCGCTGACAT TTATGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC TTCGCGGCTG CTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATG ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGTAT TAACTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGCG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CTTTGTGGGG 

351 CGTGGCGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

451 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGGC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATCCC TGTGTTCAAA 

601 CAGCCTGCTC CAAGCAAAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGTTACG 

701 GGCGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 

801 GCCCAGTTTC TTTAACGTCA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGTGCCCA GGCTGAAAAC CAAATGGCAA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAGGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTG CGCCAAGATA 

1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATATTGT GCCTCTGGTT 

1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

14 01 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1451 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACA 

1501 GGCAACCTGA TTACGGGCGA TGCAGGCAGC TTGAACATTC GCAACGGCAA 

1551 GGCGGAATAT GTTTATCCGC AATAA 

This encodes a protein having amino acid sequence <SEQ ID 310>: 



MKKSLFVLFL YSSLLTASEI AYRFVFGIET LPAAKMAET F ALTFMIAALY 
LFARYKASRL LIAVFFAFSM lANNVH YAVY QSWMTGINYW LMLKEVTEVG 
SAGASMLDKL W LPALWGVAE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 
VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 
QPAPSKIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 
PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDTNM FRLAKEQGYE 
TYFYSAQAEN QMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 
KINLQQGRHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 
QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYIVPLV 
LYSPDKAVQQ AANQAFAPCE lAFHQQLSTF LIHTLGYDMP VSGCREGSVT 
GNLITGDAGS LNIRNGKAEY VYPQ* 



ORFSlng and 0RF81-1 show 96.4% identity in 524 aa overlap: 



MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 
I I I I : : : I I I I I I I I I I I I I ! I I I I I I I I I ! I I I : I I I I I I I I : I I I I I I I I I I I :: I I 
MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 



irf Sing- 1. pep LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 
)rf81-l LIAVFFAFSIIANNVHYAVYQ3WMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 



VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 
VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
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orf 81ng-l .pep 
orf81-l 



FVGRVLPYQLFDLSKIPVFKQPAPSKIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 



orf 81ng-l .pep 
orf81-l 



LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGDTNMFRLAKEQGYE 
LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 



orfSlng-l.pep 
orf81-l 



TYFYSAQAENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGRHF 
I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I ! I i : I I 
TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 



orfSlng-l.j 
orf81-l 



IVLHQRGSHAPYGALLQPQDJCVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 



orf Blng-l .pep 
orf81-l 



AYTSDHGQYVRQDIYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

I I I I I I I I I I I I I 1 I I I I I I I I I I I : I I I I I I I I I I I I I I I I I i ! I I i I I I I I I I I I I I I 
AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 



orf 81ng-l . pep LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRNGKAEYVYPQX 

I I I I I I I I I I I I I I I I I I I I ! I I I t I I I ! I I I I : I I I I I I I ! I 
orf 81-1 LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
490 500 510 520 

Furthermore, ORFSlng shows significant homology to an E.coli OMP: 



iciated protein [E. 



gi 11256380 (050906) outer membrane adherence proted 
coli] Length = 547 
Score =87.4 bits (213), Expect = 2e-16 

Identities = 122/468 (26%), Positives = 198/468 (42%), Gaps = 70/468 (1' 



SWMT GINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAEVMLFCSLAKFRRKT 134 

SW T G ++ + EV A ML ++ P L A + L + 
SWCTFGTTFNDGFAISVLQSDPDEV AKMLG-MYSPYLCAFAFLSLLFLAVIIKYDV 141 



45 




25 








Sbjct: 


29 






82 


50 


Sbjct: 


87 






135 


55 


Sbjct: 


142 








184 




Sbjct: 


202 


60 




242 




Sbjct: 


258 




Query: 


299 


65 






Sbjct: 


311 






356 


70 


Sbjct: 


360 



++T++ S+Q+ 



-DTKQEHGISPKPTYSRIKAN— YFSFGYFVG 183 



+ +P F+ + I VLI+GES ++ L+GY R T+P + 

U^TVPYFQL SVRDTGIDTYVLIVGESVRVDNMSLYGYTRSTTPQV 257 

?KQSYSAGFMTAVSLP SFFNVIPHANGLEQISGGDTNMFRLAKEQG 298 

Q+ S TA+S+P + +V+ H I N+ +A + G 

rNQAISGAPYTALSVPLSLTADSVLSH DIHNYPDNIINMANQAG 310 

— ENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQ 355 

+N A+ ++ ++ + Y G DE LLP + Q 
rRQNGTAVTSI AMRAMETVYVRGF DELLLPHLSQALQQ 359 
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Query: 413 QPDGNWLFAYTSDHG QYVRQDIYNQG— TVQPDSYIVPL-VLYSP 454 

D Y +DHG ++++Y G +Y VP+ + YSP 

Sbjct: 419 — DRRASVMYFADHGLERDPTKKNVYFHGGREASQQAYHVPMFIWYSP 464 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 37 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 311>: 

1 ... ACCCTGCTCC TCTTCATCCC CCTCGTCCTC ACAC . GTGCG GCACACTGAC 

51 CGGCATACTC GCCCaCGGCG GCGGCAAACG CTTTGCCGTC GAACAAGAAC 

101 TCGTCGCCGC ATCGTCCCGC GCCGCCGTCA AAGAAATGGA TTTGTCCGCC 

151 yTAAAAGGAC GCAAAGCCGC CyTTTACGTC TCCGTTATGG GCGACCAAGG 

201 TTCGGGCAAC ATAAGCGGCG GACGCTACTC TATCGACGCA CTGATACGCG 

251 GCGGCTACCA CAACAACCCC GAAAGTGCCA CCCAATACAG CTACCCCGCC 

301 TACGACACTA CCGCCACCAC CAAATCCGAC GCGCTCTCCA GCGTAACCAC 

351 TTCCACATCG CTTTTGAACG CCCCCGCCGC CGyCyTGACG AAAAACAGCG 

401 GACGCAAAGG CGAACGcTCC GCCGGACTGT CCGTCAACGG CACGGGCGAC 

451 TACCGCAACG AAACCCTGCT CGCCAACCCC CGCGACGTTT CCTTCCTGAC 

501 CAACCTCATC CAAACCGTCT TCTACCTGCG CGGCATCGAA GTCgTACCGC 

551 CCGrATACGC CGACACCGAC GTATTCGTAA CCGTCGACGT A. . . 

This corresponds to the amino acid sequence <SEQ ID 312; ORF83>: 



1 ..TLLLFIPLVL TXCGTLTGIL AHGGGKRFAV EQELVAASSR AAVKEMDLSA 
51 LKGRKAAXYV SVMGDQGSGN ISGGRYSIDA LIRGGYHNNP ESATQYSYPA 
101 YDTTATTKSD ALSSVTTSTS LLNAPAAXLT KNSGRKGERS AGLSVNGTGD 
151 YRNETLLANP RDVSFLTNLI QTVFYLRGIE WPPXYADTD VFVTVDV.. 

Further work revealed the complete nucleotide sequence <SEQ ID 313>: 

1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGATTTG 

151 TCCGCCCTAA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TACCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TTTGGACCGG CCCTTACAAA GTCAGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATTACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This corresponds to the amino acid sequence <SEQ ID 314; ORF83-l>: 



1 MKTLLLLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAEAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG lEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLITPK TAAYESQYQE 

251 QYALWTGPYK VSKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 
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301 DVGNEVIRRR KGG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF83 shows 96.4% identity over a 197aa overlap with an ORF (ORF83a) from strain A of A^. 
meningitidis: 



orf 83 .pep 
orf83a 



TLLLFIPLVLTX CGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 
III : I II II I II I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MKTLLXLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSfiLKGRKAAL 



orf 83 .pep 
orf83a 



YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

II I II I II I I I I I II I I I I I I I I I I II I I I I I I II I I I I I I II I I I I I I I I I I II I I I I I 
YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 



orf 83. pep 
orf83a 



TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I II II I II II I I 

TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 



orf 83 . pep lEWPPXYADTDVFVTVDV 

orf 83a IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
190 200 210 220 230 240 

The complete length ORF83a nucleotide sequence <SEQ ID 315> is: 

1 ATGAAAACCC TGCTCNTCCT CATCCCCCTC GTCCTCACAG cctgcggcac 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA aatggacttg 

151 tccgccctga aaggacgcaa agccgccctt tacgtctccg ttatgggcga 

201 ccaaggttcg ggcaacataa gcggcggacg ctactctatc gacgcactga 

251 tacgcggcgg ctaccacaac aaccccgaaa gtgccaccca atacagctac 

301 cccgcctacg acactaccgc caccaccaaa tccgacgcgc tctccagcgt 

351 aaccacttcc acatcgcttt tgaacgcccc cgccgccgcc ctgacgaaaa 

401 acagcggacg caaaggcgaa cgctccgccg gactgtccgt caacggcacg 

451 ggcgactacc gcaacgaaac cctgctcgcc aacccccgcg acgtttcctt 

501 cctgaccaac ctcatccaaa ccgtcttcta cctgcgcggc atcgaagtcg 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGCAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 316>: 

1 MKTLLXLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG lEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

301 DVGNEVIRRR KGG* 



ORF83a and ORF83-1 show 98.4% identity in 313 aa overlap: 



10 20 30 40 50 60 

orf 83a. pep MKTLLXLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAAS3RAAVKEMDLSALKGRKAAL 
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MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 



YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I ! I I i I I I I I I I I I I I I I I I I I I I 

YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 



irf B3a . pep T3LLNAPAAALTKN3GRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
irfB3-l TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 



irfBBa.pep lEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
irfB3-l lEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 



TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
TAAYESQYQEQYALWTGPYKVSKTVKASDRLWDFSDITPYGDTTAQNRPDFKQNNGKKP 



30 or f 8 3a. pep DVGNEVIRRRKGGX 

orf83-l DVGNEVIRRRKGGX 
310 

35 Homology with a predicted ORF from N. gonorrhoeae 

ORF83 shows 94.9% identity over a 197aa overlap with a predicted ORF (ORF83.ng) from A^. 
gonorrhoeae: 

orf 83 .pep TLLLFIPLVLTXCGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 58 

40 orf83ng MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 60 

orf 83. pep YVSVMGDQGSGNISGGRY3IDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 118 

orf83ng YVSVMGDQGSGNI3GGRY3IDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 120 

45 

orf 83. pep TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 17 8 

orfBSng TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 180 

50 orf 83. pep lEWPPXYADTDVFVTVDV 197 

orf83ng lEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 240 

The complete length ORF83ng nucleotide sequence <SEQ ID 317> is: 

1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTACTCACCG CCTGCGGCAC 

55 51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AGGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCCATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGACA GCGCCACCCG ATACAGCTAC 

60 301 CCCGCCTATG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCGGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

4 01 ACAACGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

4 51 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

65 551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 
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651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTCGRCCGC GACAGCCGGA 

7 01 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGARTCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

5 851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAACCCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 3 1 8>: 

1 MKTL LLLIPL VLTAC GTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPDSATRYSY 

10 101 PAYDTTATTK SDALSGVTTS TSLLNAPAAA LTKNNGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG lEWPPEYAD TDVFVTVDVF 

2 01 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 
251 OYALWM GPYS VGKT VKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKNP 

3 01 DVGNEVIRRR KGG* 

15 ORF83ng and ORF83-1 show 97.1% identity in 313 aa overlap 



JPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRIG^ 



irf 8 3-l.pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
irf83ng YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 



or f 8 3-1. pep TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

orf 83ng TSLLNAPAAALTKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 83-1. pep lEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 

I I I I I I : I I 

orf83ng lEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 



orf83ng TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKNP 
250 260 270 280 290 300 

310 

orf 83-1. pep DVGNEVIRRRKGGX 
I I I I I I I I I I I I I I 
orf83ng DVGNEVIRRRKGGX 
310 

Based on this analysis, including the presence of a putative ATP/GTP-binding site motif A (P-loop) 
in the gonococcal protein (double-underlined) and a putative prokaryotic membrane lipoprotein 
lipid attachment site (single-underlined), it is predicted that the proteins from N. meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
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Example 38 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
319>: 



1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 AAGCCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

4 51 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTAysrr TmmGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCa GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GagCaGTTAC GGAAAAAAAC 

7 01 aGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 
751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 
801 AGATATGTTT GTTCCGACAT TGTCCGAaAA ACCCGrAAGC AAGCcgaTTT 

8 51 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 
901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCaTCAAG GGACGGCATt 
951 gaAAGAAGTG ACGGaGTTGA TGTGcgaAgG aCTATGTaAA AAacGGCTTG 

1001 CCGTTTAACC CaTACAAAGA AGAAAGCCAA GGGCAGGAAG TTCAGCAAAG 

1051 CGCGCAgCAA CATTCGGACA GGGCGsCAAG TTGCCACATT GGGCGGAAAA 

1101 CCGTAGCAGA ACCTAATGTA CGATAATTGG GAAGAACGCG GGAAACCGTT 

1151 TGAAGGAATC GGaCGGGGGC GTGGTCGGAT CGGCAAACTG A 

This corresponds to the amino acid sequence <SEQ ID 320; ORF84>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDEKAIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIEVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYX XAEVHTVNKV 

201 KRSKWFYTLP VIVLLIPVFV GLSYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPXS KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPXQN LMYDNWEERG KPFEGIGGGV VGSAN* 

Further work revealed the complete nucleotide sequence <SEQ ID 321>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCA GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GAGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACATTGG GCGGAAAACC 

1101 GTAGCAGAAC CTAATGTACG ATAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This corresponds to the amino acid sequence <SEQ ID 322; ORF84-l>: 
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1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFS3IYT LDKKVYDLYE SAEVHTVNKV 

201 KRSKW FYTLP VIVLLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKP*QN LMYDNWEERG KPFEGIGGGV VG3AN* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF84 shows 93.9% identity over a 395aa overlap with an ORF (ORF84a) from strain A oiN. 
meningitidis: 

10 20 30 40 50 60 

orf84 .pep ^4AEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 

orf84a MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf84 .pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

orf84a LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 84 . pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

orf84a IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf 84 .pep LDKKVYDLYXXAEVHTVNKVKRSKW FYTLPVIVLLIPVFVGL SYKMLSSYGKKQEEPAAQ 

orf 84a LDKKVYDLYESAEVHTVNKVKRSKW FYTLPVIILLIPVFVGL SYKMLSSYGKKQEEPAAQ 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 84 . pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 
111111:111: I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I : 
orf 84a ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 84 . pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

orf 84a EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 
310 320 330 340 350 360 



370 380 390 

orf 8 4 . pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

orf 84a ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 
370 380 390 

The complete length ORF84a nucleotide sequence <SEQ ID 323> is: 



1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCGGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TCAAAGGCTT GAAGATACCG 

151 CACACCTACA TAGAAACGGA CGCGAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGCTCT AAGCTTCTAG 

401 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
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501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AATGGTTTTA TACTCTGCCA GTAATAATAT TGCTGATTCC 

651 CGTTTTTGTC GGCCTGTCCT ATAAAATGTT AAGTAGTTAT GGAAAAAAAC 

5 701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA TCAGGCAGTA 

751 TTTCAGGATA AAACAGAAGG CGAGCCGGTA AACAACGGTA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTGTA 

901 GAAGGCGGAA GAACCGGATG CACATGCTAT TCGCATCAAG GGACGGCATT 

IQ 951 GAAAGAAATT ACAAAGGAAA TGTGCAAGGA TTACGCAAGA AACGGATTGC 

1001 CGTTTAACCC ATATAAAGAA GAAAGCCAAG GGCGGGATGT CCAGCAAAGT 

1051 GAGCAGCACC ATTCGGACAG ACCGCAAGTT GCCACGTTGG GCGGAAAGCC 

1101 GTGGCAAAAT CTTATGTATG ATAATTGGCA GGAGCGCGGA AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

1 5 This encodes a protein having amino acid sequence <SEQ ID 324>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAG3KIPENV QWLNTHRHQG IDIFVLTQGS KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

20 201 KRSKW FYTLP VIILLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEHQAV 

251 FQDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCV 

301 EGGRTGCTCY SHQGTALKEI TKEMCKDYAR NGLPFNPYKE ESQGRDVQQS 

351 EQHHSDRPQV ATLGGKPWQN LMYDNWQERG KPFEGIGGGV VGSAN* 

ORF84a and ORF84-1 show 95.2% identity in 395 aa overlap: 

25 10 20 30 40 50 60 

orf84a.pep maeiclitgtpgsgktlkmvsmmandemfkpdengirrkvftnikglkiphtyietdakk 

orf84-l [IJJiiciiTGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 



lpkstdeqlsahdmyewikkpenigsivivdeaqdvwparsagskipenvqwlnthrhqg 
lpkstdeqlsahdmyewikkpenigsivivdeaqd 



I DI FVLTQGSKLLDQNLRTLVRKHYHI ASNKMGMRTLLEWKICADaPVKMAS SAFS S I YT 

I I I I I I I I I 1 I I I I I I I 1 I I I I I I I I I I I I I ! I I I I I I I I I I I I ! I I I I I I I I I I M I I 
I DI FVLTQGPKLLDQNLRTLVRKHYHI ASNKMGMRTLLEWKI CADDPVKMAS S AFS S I YT 



irf84a.pep ldkkvydlyesaevhtvnkvkrskwfytlpviillipvfvglsykmlssygkkqeepaaq 
irf84-l ldkiwydlyesaevhtvnkvkrskwfytl 



esaatehqavfqdktegepvnngnltadmfvptlsekpeskpiyngvrqvrtfeyiagcv 
esaateqqavlpdktegepvnngnltadmfvptlsekpeskpiyngvrqvrtfeyiagci 



eggrtgctcyshqgtalkeitkemckdyarnglpfnpykeesqgrdvqqseqhhsdrpqv 
eggrtgcacyshqgtalkevtelmckdyvknglpfnpykeesqgqevqqsaq 



atlggkpwqnlmydnwqergkpfegigggwgsanx 

I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I 

atlggkpxqnlmydnweergkpfegigggwgsanx 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF84 shows 94.2% identity over a 395aa overlap with a predicted ORF (ORF84.ng) from 
gonorrhoeae: 

orf84 .pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 60 

orf8 4ng MAEICLITGTPGSGKTLKMVSbMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 60 

orf 84 .pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 120 

orf84ng LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 120 

orf 84 .pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 180 

orf84ng IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 180 

orf 84 .pep LDKKVYDLYXXAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 240 

orf84ng LDKKVYDLYESAEIHTVNfCVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 240 

orf 84 .pep ESRATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 300 

orf84ng ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 300 

orf 84 .pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 360 

orf84ng EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 360 
orf84.pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSAN 395 
orf84ng ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSAN 395 

The complete length ORF84ng nucleotide sequence <SEQ ID 325> is: 



1 ATGGCAGAAA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG C7VAACGATGA AATGTTTAAG CCAGATGAAA 

101 ACGGCGTACG CCGTTVAAGTA TTTACGAACA TCSAAGGTTT GAAGATACCG 

151 CACACCCACA TAGAAACAGA CGCAAAGAAG CTGCCGAAAT CAACCGATGA 

201 ACAGCTTTCG GCGCATGATA TGTATGAATG GATCAAGAAG CCTGAAAacg 

251 tcggcgCAAT CGTTATTGTC GATGAGGCGC AAGACGTATG GCCCGCACGC 

301 TccgCAGGTT CGAAAATCCC CGAAAACGTC CAATGGCTGA ACACACACAG 

351 GCATCAGGGC ATAGATATAT TTGTATTGAC ACAAGGTCCT AAACTCTTAG 

401 ATCAGAACTT GCGAACATTG GTTAAAAGAC ATTACCACAT TGCGGCCAAC 

451 AAAATGGGTT TGCGTACCCT GCTTGAATGG AAAGTATGCG CGGATGACCC 

501 GGTAAAAATG GCATCAAGTG CATTTTCCAG TATCTACACA CTGGATAAAA 

551 AAGTTTATGA CTTGTACGAA TCCGCAGAAA TTCACACGGT AAACAAAGTC 

601 AAGCGTTCAA AATGGTTTTA TGCATTGCCC GTCATCATAT TATTGATTCC 

651 GCTATTTGTC GGTTTGTCTT ACAAAATGTT GGGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG AGAATCGGTG AATAACGGAA ACCTTACGGC 

801 AGATATGTTT GTTCCGACAT TGCCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGGACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CACCTGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACCTTGG GCGGAAAACC 

1101 GCAGCAGAAC CTAATGTACG ACAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence <SEQ ID 326>: 



MAEICLIT GT PGSGKT LKMV SMMANDEMFK PDENGVRRKV FTNIKGLKIP 
HTHIETDAKK LPKSTDEQLS AHDMYEWIKK PENVGAIVIV DEAQDVWPAR 
SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VKRHYHIAAN 
KMGLRTLLEW KVCADDPVKM ASSAFSSIYT LDKKVYDLYE SAEIHTVNKV 
KRSKW FYALP VIILLIPLFV GL SYKMLGSY GKKQEEPAAQ ESAATEQQAV 
LPDKTEGESV NNGNLTADMF VPTLPEKPES KPIYNGVRQV RTFEYIAGCI 
EGGRTGCTCY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 
AQQHSDRAQV ATLGGKPQQN LMYDNWEERG KPFEGIGGGV VGSAN* 
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ORF84ng and ORF84-1 show 95.4% identity in 395 aa overlap: 



MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 



)rf84-l.pep 
)rf84ng 



LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 



)rf84-l.pep 
)rf84ng 



IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 



orf84-l.pep 
orf84ng 



LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 
LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 



orf84-l .pep 
or£84ng 



ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 
ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 



)rf84ng 



EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
I I I I I I I : I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 



3rf 84-1 .pep 
)rf84ng 



ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
ATLGGKPQQNLMYDNWEERGKPFEGIGGGVVGSANX 



Based on this analysis, includng the presence of a putative transmembrane domain (single- 
underlined) in the gonococcal protein, and a putative ATP/GTP-binding site motif A (P-loop, 
45 double-imderiined), it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 39 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 327>: 



GTGGTTTTCC 
TGAAGTCAAA 
CGCGTGATTT 
AAACTCGAGC 
CACGATTTAT 
AGGCGTGGAA 
ACATCCATAC 
TGAGTTCGAT 
CGGAACGGGA 
ACTCAGGAAG 
TATCCGTGAT 
CGGTTTTGCA 



TGAATGCCGA 
CTGAAAAAAT 
CGCCAGCGAT 
GCACCATCCG 
CAGGCGAGTT 
TTTGGGTGAT 
ACCAGTTTCC 
CAGTTCACTT 
AAAAAGCCTG 
GTCACAAATA 
GCGCCAGGCC 
GGAACAGGAT 



CAACGGGATA 
TCCATATCGA 
ATTGAAGTGA 
CGTGAACCAT 
TTGCCGACGG 
GCTTCGCGCG 
GTTGGAAATT 
CTATGAATGT 
AAATCCACGC 
CACCAAT . . . 
AGGCGGTCGA 
TATTTTTGGA 



TTGGTTCAGG 
TTTTTACAAT 
CGGACAAGGC 
CCTTTGACCT 
CGGTTCGGAT 
AGCCTGTCGT 
GGCAAACACA 
GGAGGACATG 
TGCCCGATGT 



ACTTGCCTTT 
ACGGGTATGC 
AACCGGTGAG 
TGCACGGCAT 
TTGACATTCA 
GTTGAAGGCA 
AATATCGTCT 
AGCGAGGGCG 
CCGCGCCGTT 

TACCG 

TATATGCTGC 
GCGCAGCGC . 
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601 TTGCAGCAGC MTACCGCTG GCTGCGTATC CCCTTGGACA AGCAGTTGAA 

651 AGCGGACACC TTTATGGCAT TGCGTGAGTT TTTGAAAGAT GGGGAAGGGC 

701 GCAAACGTCT .GTTGCCGAC GCAACCAAAG GCGCACCTGC CGAAATCCGC 

751 GAACAATTCA TGCTGGCTGC GGAAAACACG CTGAACATCT TTGCACAAAA 

801 AGGCTATTTG GGATTGGACG AATTTATTAC GTCCAATATC CCGAAAGAGC 

851 AGCAGGATAA GATGCAGGGC TATTTCTACG AAATGCTTTA CGGCGTGATG 

901 AACGCTGCTT TGGATGAAAC CAT.ACCCGG TACGGCTTGC CCGAATGGCA 

951 GCAGGATGAA GCGCGGAATC GTTTCCTGCT GCACAGTATG GATGCGTACA 

1001 CGGGTTTGAC CGAATATCCC GCGCCTATGC TGCTGCAACT TGATGGGTTT 

1051 TCCGAGGTGC GTTCGTCGGG TTTGCAGATG ACCCGTTCCC C.GGTCCGCT 

1101 TTTGGTCTAT CTC. . . 

This corresponds to the amino acid sequence <SEQ ID 328; ORF88>: 

1 MVFLNADNGI LVQDLPFEVK LKKFHIDFYN TGMPRDFASD lEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLGD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLPDVRAV 

151 TQEGHKYTNX XXXXXYRIRD APGQAVEYKN YMLPVLQEQD YFWITGTRSX 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRXVAD ATKGAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKEQQDKMQG YFYEMLYGVM 

301 NAALDETXTR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

351 SEVRSSGLQM TRSXGPLLVY L... 

Further work revealed the complete nucleotide sequence <SEQ ID 329>: 



1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCAGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

201 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTARA AATTGCGCCC GAGGTTGCCA 

401 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAACCAT TAACCGTGAA 

4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

7 01 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 

7 51 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGTGATT TCGCCAGCGA TATTGAAGTG ACGGACAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

14 01 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

1701 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

1751 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2001 CTTGAATCAT GACTGA 

This corresponds to the amino acid sequence <SEQ ID 330; ORF88-l>: 



1 MSKSRRSPPL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 
51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 
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REMKSFREKV 
DGSVLIAAKK 
PDNQAVYAKD 
DLPFEVKLKK 
LHGITIYQAS 
KYRLEFDQFT 
IVYRIRDAAG 
KQLKADTFMA 
FAQKGYLGLD 
PEWQQDEARN 
PGALLVYLGS 



KEKSLAAMRH 
GTMNKWG YIF 
FKPESILGAS 
FHIDFYNTGM 
FADGGSDLTF 
SMNVEDMSEG 
QAVEYKNYML 
LREFLKDGEG 
EFITSNIPKE 
RFLLHSMDAY 
VLLVLGTVLM 



PRDFASDIEV 
KAWNLGDASR 
AEREKSLKST 
PVLQEQDYFW 
RKRLVADATK 
QQDKMQGYFY 
TGLTEYPAPM 
FYVREKRAWV 



EVAKRYLEVQ 
GGLI DSNLLL 
SEGQSADWF 
TDKATGEKLE 
EPWLKATSI 
LNDVRAVTQE 
ITGTRSGLQQ 
GAPAEIREQF 
EMLYGVMNAA 
LLQLDGFSEV 
LFSDGKIRFA 



GFQGKTINRE 
KLGMLTGRTV 
LNADNGILVQ 
RTIRVNHPLT 
HQFPLEIGKH 
GKKYTNIGPS 
QYRWLRIPLD 
MLAAENTLNI 
LDETIRRYGL 
RSSGLQMTRS 
M3SARSERDL 



651 QKEFPKHVES LQRLGKDLNH D* 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF88 shows 95.7% identity over a 371aa overlap with an ORF (ORF88a) from strain A of A''. 
meningitidis: 

10 20 30 

orf 88 . pep MVFLNADNGILVQDLPFEVKLKKFHIDFYN 

orf88a AKDFKPESILGASNLSFRGNVNISEGQSADVVFLNADNGILVQDLPFEVKLKKFHIDFYN 
210 220 230 240 250 260 

40 50 60 70 80 90 

orf 88 . pep TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 

orf 88a TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 
270 280 290 300 310 320 

100 110 120 130 140 150 

orf 88. pep ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLPDVRAV 

M I I I I I I I I I ! I I I 1 I I I I ! I I I I I I I I I I I I I I I I I ! I M I I I I I I I I I I I I I I I I I 

orf 88a ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLNDVRAV 
330 340 350 360 370 380 

160 170 180 190 200 210 

orf 88 . pep TQEGHKYTNXXXXXXYRIRDAPGQAVEYKNYMLPVLQEQDYFWITGTRSXLQQQYRWLRI 

orf 88a TQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYMLPVLQEQDYFWITGTRSGLQQQYRWLRI 
390 400 410 420 430 440 

220 230 240 250 260 270 

orf 88. pep PLDKQLKADTFMALREFLKDGEGRKRXVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 

orf 88a PLDKQLKADTFMALREFLKDGEGRKRLVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 



GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETXTRYGLPEWQQDEARNRFLLHSM 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M 
GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETIRRYGLPEWQQDEARNRFLLHSM 



ir f 8 8 . pep DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSXGP LLVYL 

.rf8 8a DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSPGA LLVYLGSVLLVLGTVLM FYVREKR 



AWVLFSDGKIRFAMSSARSERDLQKEFPKHVESLQRLGKDLNHDX 



The complete length ORFSSa nucleotide sequence <SEQ ID 33 1> is: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 
51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 
101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 
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151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

201 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

401 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAACCAT TAACCGTGAA 

451 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGCGATT TTGCCAGTGA TATTGAAGTA ACGGATAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTTACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

1401 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

1701 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

1751 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2 001 CTTGAATCAT GACTGA 

This encodes a protein having amino acid sequence <SEQ ID 332>: 

1 MSKSRRSPPL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT 3MNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

4 01 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

451 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLH3MDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 

ORF88a and ORF88-1 100.0% identity in 671 aa overlap: 



orfSSa.pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

orf88-l MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

orf88a.pep QIFGFLGLYDVYASAWFVVIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf88-l QIFGFLGLYDVYASAWFVVIMMFLVVSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf 88a . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

Orf88-l SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 



orf 88a. pep 



GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 
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or f 8 8 - 1 GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPES ILGASNLS FRGNVN I SEGQSADVVF 240 

orf88a.pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf88-l LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf88a.pep LHGITIYQASFADGGSDLTFKAWNLGDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

I I I I 1 I I I I I I I I I I ! I I ! I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf88-l LHGITIYQASFADGGSDLTFKAWNLGDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

orf 88a . pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

I I I I I I i I I I I I I I !! I I I I I I i I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 88-1 SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf 88a . pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

orf 8 8-1 PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

orf 88a . pep GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

I I I I I I I I I I I i I I I I I M I I I I I I I I I I M! I I I I I i I I I I I I I I I I I I I I I I I I I I I I 

or f 8 8 - 1 GAPAEIREQFMLAAENTLNI FAQKGYLGLDEFITSNI PKEQQDKMQGYFYEMLYGVMNAA 540 

orf 88a . pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

I I I I 1 I I I I I I I I I I ! I I I I I I I ! I I I ! I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 88-1 LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

orf 88a . pep PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

orf 88-1 PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

orf 88a. pep LQRLGKDLNHD 672 

orf88-l LQRLGKDLNHD 672 

Homology with a predicted ORF from N.sonorrhoeae 
ORF88 shows 93. 
gonorrhoeae: 

orf 88 .pep 
orf 88ng 
orf 88 .pep 
orf88ng 
orf 88 .pep 
orf88ng 
orf 88 .pep 
orfSSng 
orf 88 .pep 
orfSSng 
orf 88 .pep 
orf88ng 
orf88 .pep 
orfBSng 



5% identity over a 371aa overlap with a predicted ORF (ORF88.ng) from N. 



MVFLNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 
MVFLNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 



60 



PLTLHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFD 120 

PLTLHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFD 120 

QFTSMNVEDMSEGAEREKSLKSTLPDVRAVTQEGHKYTNXXXXXXYRIRDAPGQAVEYKN 180 

QFTSMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKN 180 

YMLPVLQEQDYFWITGTRSXLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRXVAD 240 
I I ! I : I I : : I I M : I I I I I I I j I I I I I I I I I I ! I I I ! I I I I ! I ! I I I M I I I I I I III 

YMLPILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVAD 240 

ATKGAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVM 300 

IN I I I I I I I I I I I I I I I I II I I I I I M { I M II II I II I I I I I II II I I I I I II II I 

ATKDAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVM 300 

NAALDETXTRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 360 

I I I I II I I I I II I I I I I I I I I I I I I I I II I I I I I II II I I I I I I I II I I II I I II II I 

NAALDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 360 

TRSXGPLLVYL 371 

III I I M I I 

TRSPGALLVYLGSVLLVLGTVFMFYVPKKRAWVLFSNXKIRFAMSSARSERDLQKEFPKH 420 



-223- 



An ORF88ng nucleotide sequence <SEQ ID 333> was predicted to encode a protein having amino 
acid sequence <SEQ ID 334>: 

1 MVFLNADNGM LVQDLPFEVK LKKFHIDFYN TGMPRDFASD lEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLRD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLNDVRAV 

151 TQEGKKYTNI GPSIVYRIRD AAGQAVEYKN YMLPILQDKD YFWLTGTRSG 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRLVAD ATKDAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKGQQDKMQG YFYEMLYGVM 

301 NAALDETIRR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

351 SEVRSSGLQM TRSPG ALLVY LGSVLLVLGT VFM FYVPKKR AWVLFSNXKI 

401 RFAMSSARSE RDLQKEFPKH VESLQRLGKD LNHD* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 335>: 

1 ATGAGTAAAT CCCGTATATC TCCCACACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGCACG GTGTTACAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGACC GTTTTGGACT CGGATTTTTG ATTTTTTGGG 

201 TTTGTATGAT GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTC 

251 TGGTGGTTTC TACCAGTTTG TGTTTAATCC GTAACGTTCC GCCGTTTTGG 

301 CGCGAAATGA AGTCTTTCCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCCCCC GAAGTTGCCA 

4 01 AACGTTATCT GGAGGTGCGG GGTTTTCAGG GAAAAACCGT CAGCCGTGAG 

451 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCAcaatga acaaATGGGG 

501 CTATATCTTT GCccaagtag ctTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGCTG AAGCTGGGTA TGCTGGCCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AAAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT GTTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGCGATT TTGCCAGCGA TATTGAAGTA ACGGACAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGAGGGA TGCTTCGCGC GAACCTGTCG 

1001 TGTTGAAGGC AACCTCCATA CACCAGTTTC CGTTGGAAAT CGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGT GCGGAACGGG AAAAAAGCCT GAAATCCACT CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATCGTGTACC GCATCCGTGA TGcggCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGATTTTGC AGGACAAAGA TTATTTTTGG CTGACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

14 01 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GACGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAATATC 

1501 TTTGCGCAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGGG CAGCAGGATA AGATGCAGGG CTATTTCTAC GTWiTGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAC CGTTTCCTGC TGCACAGTAT 

17 01 GGATGCCTAT ACGGGGCTGA CGGAATATCC CGCGCCTATG CTGCTCCAGC 

1751 TTGACGGGTT TTCCGAGGTG CGTTCCTCAG GTTTGCAGAT GACCCGTTCG 

1801 CCGGGTGCGC TTTTGGTCTA TCtcggctcg gtattgttgg TTTTGGgtac 

1851 ggtaTttatg tTTTATGTGC GCGAAAAACG GGCGTGGgta tTGTTTTCag 

1901 aCGGCAAAAT CCGTTTTGCT ATGtCTTcgg CCcgcagcga ACGGGATTTG 

1951 cAGAaggaaT TTCCAAAACA CGtcgAGAGC CTGCAACggc tcggcaaggA 

2001 CttgaaTCAT GACTga 

This corresponds to the amino acid sequence <SEQ ID 336; ORF88ng-l>: 

1 MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FVVI MMFLVVSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWG YIF AQVALIVICL GGLI DSNLLL KLGMLAGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADVV^F LNADNGMLVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLRDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PILQDKDYFW LTGTRSGLQQ QYRWLRIPLD 

451 KQLKADTFMA LREFLKDGEG RKRLVADATK DAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKG QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 
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551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 
601 PG ftLLVYLGS VLLVLGTVFM FYVREKRAWV LFSDGKIRFA MSSARSERDL 
651 QKEFPKHVES LQRLGKDLNH D* 

ORF88ng-l and ORF88-1 show 97.0% identity in 671 aa overlap: 

orf 88-1 . pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II: 
orf88ng-l MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 

orf 88-1 . pep QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

: I I I I II II II II I I I I I I I I I I I II I I I I I I I I I I I I II II I II I II II I I I I I I I I I 
orf88ng-l RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf 88-1 .pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

orf88ng-l SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICL 180 

orf 88-1 .pep GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

II I II I II I II I I I : II I I II II II I II II II I II II I I I I II II II I II I II II II II 
orf88ng-l GGLIDSNLLLKLGMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 240 

orf 88-1. pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf88ng-l LNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf 88-1 .pep LHGITIYQASFADGGSDLTFKAWNLGDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 360 

I I I I I I I I I I I I I I I I II I I I I I II I I I I I I II I I I I I I I I I I I I I II I I II I II I I I I 
orf88ng-l LHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 3 60 

orf 88-1 .pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf88ng-l SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf 88-1 .pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

orf88ng-l PILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

orf 88-1. pep GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

orf88ng-l DAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVMNAA 54 0 

orf 8 8-1. pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 

orf88ng-l LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

orf 8 8-1. pep PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 6 60 

orf88ng-l PGALLVYLGSVLLVLGTVFMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

orf 8 8-1. pep LQRLGKDLNHD 671 

I I I II I I I I I I 
orf88ng-l LQRLGKDLNHD 671 

Furthermore, ORG88ng-l shows homology with a hypothetical protein from Aquifex aeolicus: 

gi 1 2984296 (AE000771) hypothetical protein [Aquifex aeolicus] Length = 537 
Score =94.4 bits (231), Expect = 2e-18 

Identities = 91/334 (27%), Positives = 159/334 (47%), Gaps = 59/334 (17%) 

Query: 16 FAFFSSMRFAVALLSLLGIASVIG-TVLQQNQPQTDYLVKFGPFWTRIFDFLGLYDVYAS 7 4 

+ F +S++ A+ ++ +LGI S++G T ++QNQ YL +FG L L DV+ S 

Sbjct: 80 YDFLASLKLAIFIMLVLGILSMLG3TYIKQNQSFEWYLDQFGYDVGIWIWKLWLNDVFHS 139 

Query: 7 5 AWFVVIMMFLWST3LCLIRNVPPFWREMKSFREKVKEKSLAAMRHSSLLDVKIAPEVAK 134 

++++ ++ L V+ C 1+ +P W++ S +E++ + A +H + VKI P+ K 
Sbjct: 140 WYYILFIVLLAVNLIFCSIKRLPRVWKQAFS-KERILKLDEHAEKHLKPITVKI-PDKDK 197 

Query: 135 — RYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICLGGLIDSNLLLKL 192 

++L +GF+ V E + + A+KG ++ G +AL+VI G LID 
Sbjct: 198 VLKFLLKKGFK-VFVEEEGNKLYVFAEKGRFSRLGVYITHIALLVIMAGALID 249 
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Query: 193 GMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWFLNADNGMLVQDL 252 

+ I+G RG++ ++EG + DV+ + A+ L 

Sbjct: 250 AIVGV RGSLIVAEGDTNDVMLVGAE— QKPYKL 280 

Query: 253 PFEVKLKKFHIDFY NTGMPRDFA SDIEVTDKATGEKLER— TIRVNHPLT 300 

PFVLFiy N++FA SDIE+ + G K+E T++VN P 
Sbjct: 281 PFAVHLIDFRIKTYAEENPNVDKRFAQAVSSYESDIEIIN GGKVEAKGTVKVNEPFD 337 

Query: 301 LHGITIYQASFA— DGGSDLTFKAWNLRDASREP 332 

++QA++ DG S + + + A +P 

Sbjct: 338 FGRYRLFQATYGILDGTSGMGVIWDRKKAHEDP 371 

Based on this analysis, including the putative transmembrane domain in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 40 



The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
337>: 



1 ATGATGAGTA ATAiiiAATGGiti ACAAAAAGGG TTTACATTGA TTGmGmTGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCmAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GyCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAaTATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

4 51 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 338; ORF89>: 

1 MMSNXMXQKG FTLIXXMIW AILGIISVIA IPSYXSYIEK GYQSQLYTEM 

51 XGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Further work revealed the complete nucleotide sequence <SEQ ID 339>: 

1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCAAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAATATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

4 51 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ID 340; ORF89-l>: 

1 MMSNKMEQKG FTLIEMMIW AILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with PilE of N. gonorrhoeae (accession number Z69260). 



ORF89 and PilE protein show 30% aa identity in 120a overlap: 
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QKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQFILKNPL- 66 
QKGFTLI MIV+AI+GI++ +A+P+Y Y+ S+ G+ ++L++ 

QKGFTLIELMIVIAIVGILAAVALPAYQDYTARAQVSEAILLAEGQKSAVTEYYLNHGIW 64 



orf89 8 
PilE 5 
orf89 67 
PilE 65 

Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF89 shows 83.3% identity over a 162aa overlap with an ORF (ORF89a) from strain A of A^. 
meningitidis: 



-DDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGYTLSVW 125 

DN + +G + KI KY SV + GV K G LS+W 

PKDNTS AGVASSDKIKGKYVQSVTVAKGWTAEMASTGVNKEIQGKKLSLW 115 



orf89.pep 
orf89a 



MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 
MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 



or f 8 9. pep 
orf89a 



ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 



TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 

I I I I I I I I I ! I I I I I I I I I I I : I I I I I I ! I i I I j i I i I I I I I I 
TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 



The complete length ORF89a nucleotide sequence <SEQ ID 341 > is: 



ATGATGAGTA ATAAAATGGA 
NATNGNCNTC GCGATACNCN 
ATCNNAGTTA TATTGAAAAA 
GTCGGTATCA ACAATATTTC 
CGATAATCAG ACCATCAAGA 
AGATGAATCC GAAAATTGCC 
AATGAGGAAA AACCNAGGGC 
GACGGGTTAT ACTTTGTCGG 
AATGCCGTGA TGCCGCTTCT 
GATGTCGGCT GTGAAGCCTT 



ACAAAAAGGG TTTACATTGA TTGNGANGNT 
GCNTTANCAG CGTCATTNCN ATNNNTNCNT 
GGCTATCAGT CCCAGCTTTA TACGGAGATG 
CAAACAGTNT ATTTTGAAAA ATCCCCTGGA 
GCAAACTGGA AATATTTGTC TCAGGCTATA 
GAAAAATATA ATGTTTCGGT GCATTTTGTC 
ATACAGCTTG GTCGGCGTTC CAAAGACGGG 
TATGGATGAA CAGCGTGGGC GACGGATACA 
GCCCGAGCCC ATTTGGAGAC CTTGTCCTCA 
CTCTAATCGT AAAAAATAG 



This encodes a protein having amino acid sequence <SEQ ID 342>: 

1 MMSNKMEQKG FTLIXXXXXX AIXXXXSVIX XXXYXSYIEK GYQSQLYTEM 

51 VGINNISKQX ILKNPLDDNQ TIKSKLEIFV SGYKMNPKIA EKYNVSVHFV 

101 NEEKPRAYSL VGVPKTGTGY TLSVWMNSVG DGYKCRDAAS ARAHLETLSS 

151 DVGCEAFSNR KK* 

ORF89a and ORF89-1 show 83.3% identity in 162 aa overlap: 



orf8 9a.pep MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 
orf89-l MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 



ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 
ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 



TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 
TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
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Homology with a predicted ORF from N.zonorrhoeae 

ORF89 shows 84.6% identity over a 162aa overlap with a predicted ORF (ORF89.ng) from A^. 
gonorrhoeae: 

orf89 MMSKXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 60 

orf89ng MMSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 60 

orf8 9 ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 120 

orf89ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 120 

orf8 9 TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKK 152 

orf8 9ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKK 152 

The complete length ORF89ng nucleotide sequence <SEQ ID 343> is: 

1 aTGATGAGCA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTTGTC ACGATACTCG GCATCATCAG CGTCATTGCC ATACCTTCTT 

101 ATCAGAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATGTTCT CAAACAGTTT ATTTTGAAAA ATCCCCAGGA 

201 CGATAATGAT ACCCTCAAGA GCAAACTGAA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAAttgCC AAAAAATATA GTGTTTCGGt aaggtttGTC 

301 gatGCGGAAA AACCAAGGGC ATACAGGTTG GTCGGCGTTC CGAACGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCACTTCT GCCCAGGCCT ATTCGGACAC CTTGTCCGCA 

451 GATAGCGGCT GTGAAGCTTT CTCTAATCGT A7\AA7\ATAG 

This encodes a protein having amino acid sequence <SEQ ID 344>: 

1 MMSNKMEQKG FTLIEMMIW TILGIISVI A IPSYQSYIEK GYQSQLYTEM 
51 VGINNVLKQF ILKNPQDDND TLKSKLKIFV SGYKMNPKIA KKYSVSVRFV 
101 DAEKPRAYRL VGVPNAGTGY TLSVWMNSVG DGYKCRDATS . AQAYSDTLSA 
151 DSGCEAFSNR KK* 

This gonococcal protein has a putative leader peptide (underlined) and N-terminal methylation site 
(NMePhe or type-4 pili, double-underlined). In addition, ORF89ng and ORF89-1 show 88.3% 
identity in 162 aa overlap: 

10 20 30 40 50 60 

orf8 9-l.pep MMSNKMEQKGFTLIEMMIVVAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 

or f 8 9ng MMSNKMEQKGFTLIEMMIWTILGIISVIAI PSYQSYIEKGYQSQLYTEMVGINNVLKQF 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 8 9- 1 . pep I LKNPLDDNQT lENKLE I FVSGYKMNPKIAKKYSVS VKFVDKEKSRAYRLVGVPKAGTGY 

orf89ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 
70 80 90 100 110 120 

130 140 150 160 

orf 89-1 . pep TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 

orf89ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKKX 
130 140 150 160 

Based on this analysis, including the gonococcal motifs and the homology with the known PilE 
protein, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their 
55 epitopes, could be useftil antigens for vaccines or diagnostics, or for raising antibodies. 
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ORP'89-1 (13.6kDa) was cloned in the pGex vector and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 1 A 
shows the results of affinity purification of the GST- fusion protein. Purified GST-fiision protein 
was used to immunise mice, whose sera gave a positive result in the ELISA test., confirming that 
ORF89-1 is a surface-exposed protein, and that it is a useful inununogen. 

Example 41 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 345>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGsG CACCG.GTCC GACG.GCAAA 

251 AACAAGCGTT GGCCn.AGAA TTTCAACCC. . . 

This corresponds to the amino acid sequence <SEQ ID 346; 0RF91>: 



Further work revealed the complete nucleotide sequence <SEQ ID 347>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

2 01 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

4 01 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

4 51 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence <SEQ ID 348; 0RF9I-1>: 

1 MKKSSLISAL GIGILSIGMA FA APADAVSQ IRQNATQVLS ILKNGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGGK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF fi-om N.menimitidis Tstrain A) 

0RF91 shows 92.4% identity over a 92aa overlap with an ORF (0RF91a) from strain A of A^. 

meningitidis: 

10 20 30 40 50 60 

orf 91.pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 

orf91a MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 



YFDFQRMTALAVGNPWXTXS DXQKQALAXE FQP 

1 I I I I I I I I I I I I I I I I II I I I I I I III 

YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 
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orf91a KGGKEIIVRAEVGVPGQKPVNMDFTTYQ3GGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
130 140 150 160 170 180 

The complete length 0RF91a nucleotide sequence <SEQ ID 349> is: 

1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAACCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA GCGGTGATGC CAACACCGCC 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAG GCTAAAAACG GCAGCAAGTA A 

This encodes a protein having amino acid sequence <SEQ ID 350>: 

1 MKKSSFISAL GIGILSIGMA FA APADAWQ IRQNATQVLS ILKSGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIW KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGSK* 

0RF91a and ORF91-1 show 98.0% identity in 196 aa overlap: 

10 20 30 40 50 60 

orf gia.pep MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 

orf 91-1 MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 



YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 
YFDFQRMTALAVGNPWRTA3DAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 



orf 91a. pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 

o r f 9 1 - 1 KGGKE 1 1 VRAEVGVPGQKPVNMDFTT YQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 

130 140 150 160 170 180 

190 

orf 91a. pep GVDGLIAELKAKNGSKX 

orf91-l GVDGLIAELKAKNGGKX 
190 

Homology with a predicted ORF from N.sonorrhoeae 

0RF91 shows 84.8% identity over a 92aa overlap with a predicted ORF (0RF91.ng) from N. 
gonorrhoeae: 

orf 91 . pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 60 

orf91ng VKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 60 

orf 91. pep YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 93 

I 1 I I I I I I I I I ! I I I I I II I I I I I I III 
orfging YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 120 

The complete length 0RF91ng nucleotide sequence <SEQ ID 35 1> is predicted to encode a protein 
having amino acid sequence <SEQ ID 352>: 
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1 VKKSSFISAL GIGILSIGMA FA SPADRVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTA3 DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIECAK GIDGLIAELK AKEGGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 353>: 

1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCTCCC CGGCCGACGC AGTGGGACAA ATCCGCCAAA 

101 ACGCCACACA GGTTTTGACC ATCCTCAAAA GCGGCGACGC GGCTTCTGCA 

151 CGCCCAAAAG CCGAAGCCTA TGCGGTTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG TACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTCAA AAACGCGACC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAGGGCGGCA AGGAAATCGT CGTCCGTGCC GAAGTCGGCA 

401 TCCCCGGTCA GAAGCCCGTC AATATGGACT TTACCACCTA CCAAAGCGGC 

451 GGCAAATACC GTACCTACAA CGTCGCCATC GAAGGCACGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATCAT CAAAGCCAAA GGCATCGACG 

551 GGCTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence <SEQ ID 354; 0RF91ng-l>: 



1 MKKSSFISAL GIGILSIGMA FA SPADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

0RF91ng-l and 0RF91-1 show 92.3% identity in 196 aa overlap: 



orf 91-1 . pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 
orf91ng-l MKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 



irf 91-1 . pep YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 
>rf91ng-l YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 



130 140 150 160 170 180 

orf 91-1. pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
I I I I I I : I I I I I I : I I I I ! I I I I I I I I I I I I I I I I I i I i I I I : I I I I I I I I I I I I I I I I I 
orf91ng-l KGGKEIWRAEVGIPGQKPVNMDFTTYQSGGKYRTYNVAIEGTSLVTVYRNQFGEIIKAK 

130 140 150 160 170 180 



)rf 91-1. pep GVDGLIAELKAKNGGKX 
I : I I I I I I I I I I I I I M 
)rf91ng-l GIDGLIAELKAKNGGKX 



In addition, 0RF9Ing-l shows homology to a hypothetical E.coli protein: 



sp|P45390|YRBC_ECOLI HYPOTHETICAL 24.0 KD PROTEIN IN MURA-RPON INTERGENIC 
REGION PRECURSOR (F211) >gi 1 606130 (D18997) 0RF_f211 [Escherichia coli] 
>gi 1 1789583 (AE000399) hypothetical 24.0 kD protein in murZ-rpoN intergenic 
region [Escherichia coli] Length = 211 

Score =70.6 bits (170), Expect = 6e-12 

Identities = 42/137 (30%), Positives = 76/137 (54%), Gaps = 6/137 (4%) 

Query: 59 VPYFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPI 118 

+PY + AL +G +++A+ AQ++A F+L + Y + + T + P 
Sbjct: 65 LPYVQVKYAGALVLGQYYKSATPAQREAYFAAFREYLKQAYGQALAMYHGQTYQIA— PE 122 



Query: 119 VNKGGKEIV-VRAEVGIP-GQKPVNMDFTTYQSG—GKYRTYNVAIEGTSLVTVYRNQFG 174 

G K IV +R + P G+ PV +DF ++ G ++ Y++ EG S++T +N++G 
Sbjct: 123 QPLGDKTIVPIRVTIIDPNGRPPVRLDFQWRKNSQTGNWQAYDMIAEGVSMITTKQNEWG 182 
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Query: 175 EIIKAKGIDGLIAELKA 191 

+++ KGIDGL A+LK+ 
Sbjct: 183 TLLRTKGIDGLTAQLKS 199 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 42 

The following DNA sequence was identified in N. meningitidis <SEQ ID 355>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACTCAAAAC GAAACCGCTA 

101 TGATCACGCA TACCCTCATC TCAAAATACA GTTTTGGnnn nnnnnnnnnn 

151 nnnnnnnnnn nnGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCAC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 356; ORF97>: 

1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMITHTLI SKYSFGXXXX 

51 XXXXAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ED 357>: 

1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ED 358; ORF97-l>: 

1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

151 KLIQKTVGE* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORE from N.meninsitidis (strain 

ORF97 shows 88.7% identity over a 159aa overlap with an ORE (ORF97a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 50 

orf 97.pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 

or f 97 a MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
10 20 30 40 50 60 
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orf97 .pep MDIFAVIDHQEAARRNGLTMQPAPCVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
orf97a MDIFAVIDHQEAARRNGLTMQPAICVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 



orf 97 . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 

orf97a VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 
130 140 150 160 

The complete length ORF97a nucleotide sequence <SEQ ID 359> is: 

1 ATGANACACA TACTCCCCCT GANTGNCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGNN CATCCTGCCA GCGAACCGCA AACCCAAAAC GAAACCGCTA 

101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GTACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCNTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCAT AGGCGAATAA 

This encodes a protein having amino acid sequence <SEQ ID 360>: 

1 MXHILPLXXA SALCISTASX HPASEPQTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVXVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

151 KLIQKTIGE* 

ORF97a and ORF97-1 show 95.6% identity in 159 aa overlap: 



10 20 30 40 50 60 

orf 97a . pep MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
I Mill II I I I II I II II I II I : II I I I I II I I I I I I II II I I I I I I I I I I I II I I 
orf 97-1 MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 97a . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

I I I I II I I I I I I I I I I I I II I I M I I I I II II 1 I I I II I I I I I II I I I II II I II I M I 
orf 97-1 MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
70 80 90 100 110 120 



130 140 150 160 

orf 97a . pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 

I I I II I I I I I I i I I I I I I I I I I M I II I I I I I I I I I : I M 
orf 97-1 VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
130 140 150 160 



Homologv with a predicted ORF from N.sonorrhoeae 



ORF97 shows 
gonorrhoeae: 

orf 97 .pep 
orf97ng 
orf 97. pep 
orf97ng 
orf 97. pep 
orf97ng 



.1% identity over a 159aa overlap with a predicted ORF (ORF97.ng) from A^. 



MKHILPLIAASALCISTASAHPASEPSTQNETAMXTHTLISKYSFGXXXXXXXXAIKSKG 60 
II I I I I I I I I I : I I I I I I I I 1 I :: I I I I I I I I I I I I I I 11 I : : I II I I I 

MKHILPPIAASAFCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 60 

MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 
I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPJCAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGE 159 

VRTAYTDTRALIVGSRISFDEVANTLANAEKLIQKTVGE 159 
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The complete length ORF97ng nucleotide sequence <SEQ ID 361> is predicted to encode a protein 
having amino acid sequence <SEQ ID 362>: 

1 MKHILPPIAfl SAFCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
5 101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 363>: 



1 ATGAAACACA TACTCCCcct gatcgccgca TocgcactCT GCATTTCAAC 

51 CGCTTCGGCA CACCCTGCCG GCAAACCGCC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA CACCCTCACC TCGAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCGGCAC GCCGAAACGG CCTGACCATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAGGCCG GTACGCCgct GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCACCG CCTATACCGA TACGCGCGCC CTCATCGTCG 

401 GCAGCCGCAT CAGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 364; ORF97ng-l>: 



1 MKHILPLIAA SALCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KLIQKTVGE* 

ORF97ng-l and ORF97-1 show 96.2% identity in 159 aa overlap: 

10 20 30 40 50 60 

orf 97-1 . pep MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

orf97ng-l MKHILPLIAASALCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 



orf 97-1 . pep 
orf97ng-l 



MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I I 1 I I I 
MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 



orf 97-1. pep 
orf97ng-l 



VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
VRTAYTDTRALIVGSRISFDEVANTLANAEKLIQKTVGEX 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N.meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF97-I (15.3kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
45 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figures 
12A & 12B show, repsectively, the results of affinity purification of the GST-fiision and His-fusion 
proteins. Purified GST-fiision protein was used to immunise mice, whose sera were used for 
Western Blot (Figure 12C), ELISA (positive result), and FACS analysis (Figure 12D). These 
experiments confirm that ORF97-1 is a surface-exposed protein, and that it is a useful immunogen. 
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Figure 12E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF97-1. 



Example 43 

The following DNA, believed to be complete, sequence was identified in N. meningitidis <SEQ ID 
365>: 



5 1 ATGGCTTTTA TTRCGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGg 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

10 251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACaATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACc AaACGCTACC GCGTTACCgT 

351 CGgCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

4 51 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

15 501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

651 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 366; ORF106>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

20 101 DYKLSFHPLT KRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Further work revealed the following DNA sequence <SEQ ED 367>: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

25 101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGG 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACAATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACC AACCGCTACC GCGTTACCGT 

30 351 CGGCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

401 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

35 This corresponds to the amino acid sequence <SEQ ID 368; ORF106-1>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEA RI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT WRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

40 Computer analysis of this amino acid sequence gave the following resuUs: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF106 shows 87.4% identity over a 199aa overlap with an ORF (ORF106a) from strain A of A^. 
meningitidis: 

10 20 30 40 50 59 

45 or f 106 . pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 

I I I I I I I I I I I II:: II : : :: I I 1 M I M 1 I I I I I : I I II I I I I I I I I I I I I 
orfl06a MAFITRLFKSIKQWLVLLPMLSVLPDAAAEGIDVSRAEARIXDGGQLSXXSRFQTELPDQ 
10 20 30 40 50 60 

50 60 70 80 90 100 110 119 

orf 106 . pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 
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or f 10 6a LQXAXXRGVXLNXTLXWQLSAPIIASYRFXLGQLIGDDDXIDYKLSFHPLTNRYRVTVGA 
70 80 90 100 110 120 

120 130 140 150 160 170 179 

5 or f 1 0 6 . pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

III I I II II II I I II II II I I I I I I I I II I II M I I I II I I I II II M II I I I II I M I 
orfl06a FSTXYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 
130 140 150 160 170 180 

10 180 190 199 

orflOe.pep SQNWHLDSGWKPLNIIGNKX 
I I I I I I II I I I I I I I I I I I I 
orfl06a SQNWHLDSGWKPLNIIGNKX 
190 200 

1 5 Due to the K-^N substitution at residue 1 1 1 , the homology between ORF 1 06a and ORF 1 06- 1 is 
87.9% over the same 199 aa overlap. 



The complete length ORF 106a nucleotide sequence <SEQ ID 369> is: 



1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GCTGCCGATG CTTTCCGTTT TGCCGGACGC GGCGGCGGAG GGGATAGATG 

20 101 TGAGCCGCGC CGAAGCGAGG ATAANCGACG GCGGGCAGCT TTCCATNAGN 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAANNNG CGNNGNGCCG 

201 GGGCGTGNCG CTCAACTNTA CCTTAAGNTG GCAGCTTTCC GCCCCGATAA 

251 TCGCTTCTTA TCGGTTTNAA TTGGGGCAAC TGATTGGCGA TGACGACNAT 

301 ATTGACTACA AACTGAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

25 351 CGTCGGCGCG TTTTCGACAG ANTACGACAC CTTGGATGCG GCATTGCGCG 

401 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGCTGTCC 

451 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TTCAAATCAA TGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

30 This encodes a protein having amino acid sequence <SEQ ED 370>: 

1 MAFITRLFKS IKQWLVLLPM LSVLPDAAAE GIDVSRAEA R IXDGGQLSXX 

51 SRFQTELPDQ LQXAXXRGVX LNXTLXWQLS APIIASYRFX LGQLIGDDDX 

101 IDYKLSFHPL TNRYRVTVGA FSTXYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETE<AE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK-" 

35 

Homology with a predicted ORF from N.sonorrhoeae 

ORF106 shows 90.5% identity over a 199aa overlap with a predicted ORF (ORF106.ng) from N. 
gonorrhoeae: 



orflOe.pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 



orf 106ng 

orflOe.pep 

orfl06ng 

orf 106. pep 

orfl06ng 

orflOe.pep 

orfl06ng 



I I I I 



I 



: I 



: I I I I I 



I I I 



I I I 



MAFITRLFKSIKQWLVLLPILSVLPDAAAEGIAATRAEARITDGGRLSISSRFQTELPDQ 

LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 

I I I I I I I I I I I I I I I I I I I I I I I I M M I I M I M I I I I I I I I I I 1 I I I I : I I I I I I I I 
LQQALRRGVPLNFTLSWQLSAPTIASYRFKLGQLIGDDDNIDYKLSFHPLTNRYRVTVGA 

FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

II II I II I I I I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I II I 
FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

SQNWHLDSGWKPLNIIGNK 198 

II I I I II I I II I I I I I II I 
SQNWHLDSGWKPLNIIGNK 199 



Due to the K^N substitution at residue 1 1 1 , the homology between ORFl 06ng and ORF 1 06-1 ii 



55 91.0%) over the same 1 99 aa overlap. 
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The complete length ORF106ng nucleotide sequence <SEQ ID 371> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GTTGCCGATA CTCTCCGTTT TGCCGGACGC GGCGGCGGAG GGCATTGCCG 

101 CGACCCGCGC CGAAGCGAGG ATAACCGACG GCGGGCGGCT TTCCATCAGC 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAACAGG CGTTGCGCCG 

2 01 GGGCGTACCG CTCAACTTTA CCTTAAGCTG GCAGCTTTCC GCCCCGACAA 

251 TCGCTTCTTA TCGGTTTAAA TTGGGGCAAC TGATTGGCGA TGACGACAAT 

301 ATTGACTACA AACTAAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCA TTTTCCACCG ATTACGACAC TTTGGATGCG GCATTGCGCG 

4 01 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGTTGTCC 

4 51 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TCCAAATCAA CGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 372>: 

1 MAFITRLFKS IKQWLVLLPI LSVLPDAAAE GIAATRAEA R ITDGGRLSIS 

51 SRFQTELPDQ LQQALRRGVP LNFTLSWQLS APTIASYRFK LGQLIGDDDN 

101 IDYKLSFHPL TNRYRVTVGA FSTDYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could he useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF106-1 (18kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
13A shows the resuUs of affinity purification of the His-fiision protein, and Figure 13B shows the 
results of expression of the GST-fiision in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 13C) These experiments confirm that 
ORF 106-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 44 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
373>: 



1 ATGGACACAA AAGAAATCCT 

51 GGTTTTAGCC GTCATCATCc 

101 ACGACATCGG GCGCATCGTG 

151 TCGGTGTTGT GCCTCGGGCT 

201 CACCGCCGAC AAAGACAcCT 

251 TGTCTGCCGC CGCGATAGCC 

301 TCTGAAATCC TGTTTTCACT 

351 GCTGTTTGAA CtGAGCTTCC 

401 GTATGGAAGG ACGCGCCcTT 

451 CTCGCCATCC TGCTGCTG.T 

501 AGCGAACACC GCCGTCCTGA 

551 CCGCCGCCTT TTTGCTGTTT 

601 CACGCACCGT TTTCGCCCGC 

651 ACCGATCGCA CTGAGCAGCA 

701 GTTTGTTCCT GAAAAAATAT 

751 ATGGGTATTT CGTTCGGCGG 

801 AACGGTCTGG ACACCGTATA 

851 CCGCTCGCCT CTCGGCAACG 

901 GCCCTCTGC. TGACCGGCAT 

951 GGAAAACTAC GCCGCCGTCC 



CGG.TACGCG GcAGGcTCGA TCGGCAGCGC 
TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 
CTGATGCAGA CGGCGGCGGG GCTgACGGTG 
GGATCAGGCA TACGTCCGCG AATACTATGC 
TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 
GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 
CGACGATGCC gCCGCCGGCa TCGGGCTGGT 
TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 
GCCTTTTCGT CCGCGCAACT CGTGCcCAAG 
GCCGCTGACG GTCGGGCTGC TGCACTTTCC 
CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 
CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 
CGTCCTGCAC CGGGGG.TGC GCTACGGCAT 
TCGCCTATTG GGGGCTGGCA TCCGCCGACC 
GCCGGCCTGG AACAGCTCGG CGTTTATTCG 
GGCGGCATTA TTGTTCCAAA GCATCTTTTC 
TTTTCCGCGC AATCGAAGAA AACGCCCCGC 
GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 
TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 
GGTTTATCGT CGTATCGTGT ATG.TGCCGC 
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1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTT 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG ACCGTGCCGT ACCGGCGAGG CCGCC.GGCG 

1151 CGGCGGTTGC CTGTGCCGCC TCATTCTGGC TGTTTTTTGC CTTCAAGACC 

5 1201 GAAAGCTCyT GCCGCCTGTG GCAGCCGCTC AAACGCCTGC CGCTTTATCT 

1251 GCACACATTG TTCTGCCTGA CCTCCTCGGC GGCCTACACC TGCTTCGGCA 

1301 CGCCGGCAAA CTATCCCCTG TTTGCCGGCG TATGGGCGGC ATATCTGGCA 

1351 GGCTGCATCC TGCGCCACCG GAAAGATTTG CACAAACTGT TTCATTATTT 

1401 GAAAAAACAA GGTTTCCCAT TATGA 

10 This corresponds to the amino acid sequence <SEQ ID 374; ORF10>: 

1 MDTKEILXYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYATAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 LAILLLXPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

15 201 HAPFSPAVLH RGXRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEE NAPPARLSAT AESAAALLAS 

301 ALCXTGIFSP LASLLLPENY AAVRFIVVSC MXPPLFCTLA EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLDRAVPAR PXGAAVACAA SFWLFFAFKT 

401 ESSCRLWQPL KRLPLYLHTL FCLTSSAAYT CFGTPANYPL FAGVWAAYLA 

20 451 GCILRHRKDL HKLFHYLKKQ GFPL* 

Further sequence analysis revealed the complete DNA sequence<SEQ ID 375> to be: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

25 151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACACCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

30 4 01 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAG 

4 51 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

35 651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

40 901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCGC 

1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTGCCGTC CGGCGGCGCG CGCGGCGCGG 

45 1151 CGGTTGCCTG TGCCGCCTCA TTCTGGCTGT TTTTTGCCTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATCTGCA 

1251 CACATTGTTC TGCCTGACCT CCTCGGCGGC CTACACCTGC TTCGGCACGC 

1301 CGGCAAACTA TCCCCTGTTT GCCGGCGTAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

50 1401 AAAACAAGGT TTCCCATTAT GA 

This corresponds to the amino acid sequence <SEQ ID 376; ORP10-1>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRI V LMQTAAGLTV 

51 SVLCL GLDQA YVREYYATAD KDTLFKT LFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIG LVLFE LSFLPIRFLL LV LRMEGRAL AFSSAQLVPK 

55 151 LAILLLLPLT VGLL HFPANT AVLTAVYALA NLAAAAFL LF QNRCRLKAVR 

201 HAPFSPAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQ LGVYS 

251 MGISFGGAAL LF QSIFSTVW TPYIFRAIEE NAPPARLSAT AESA AALLAS 

301 ALCLTGIFSP L ASLLLPENY AAVRFIVVSC MLPPLFCTLA EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLAVPSGGA RGAAVACAAS FWLFFAFKTE 



Computer analysis of this amino acid sequence gave the following results: 
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Prediction 

ORFlO-1 is predicted to be the precursor of an integral membrane protein, since it comprises 
several (12-13) potential transmembrane segments, and a probable cleavable signal peptide 
Homology with EpsM from Streptococcus thermophilus (accession number U4Q830). 
ORFl 0 shows homology with the epsM gene of iS", thermophilus, which encodes a protein of a size 
similar to ORFIO and is involved in expolysaccharide synthesis. Other homologies are with 
prokaryotic membrane proteins: 



Identitj 
Query: 



213 LRYGI PLALS SLAYWGLASADRLFLKKYAGLEQLGVYSMGI SFGGAALLLQSIFSTVW 27 0 

L Y +PL SS+ +W L ++ R F+ + G G+ ++ + +IF+ W 

210 LYYALPLIPSSILWWLLNASSRYFVLFFLGAGANGLLAVATKIPSIISIFNTIFTQAW 2 67 



7 LGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQAYVR 63 
L + G++GS +L +++PL ++ + G L QT A L + ++ + + A +R 

12 LVFTIGNLGSKLLVFLLVPLYTYAMTPQEYGMADLYQTTANLLLPLITMNVFDATLR 68 



Query: 307 IFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIXXXXXXXXXX 366 

+ P+ ++ +YA+ V ML LF + ++ G ++T+ + 

Sbjct: 305 VLKPIVEKWSSDYASSWQYVPFFMLSMLFSSFSDFFGTNYIAAKQTKGVFMTSIYGTIV 364 

Homology with a predicted ORF from N.meninzitidis (strain A) 

ORFIO shows 95.4% identity over a 475aa overlap with an ORF (ORF 10a) from strain A of A^. 
meningitidis: 

10 20 30 40 50 60 

or f 10. pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

or f 10a MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 



YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 



LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 



NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 
NLAAAAFLLFQNRCRLKAVRRAPFS SAVLHRGLRYGI PIALS S lAYWGLASADRLFLKKY 



AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEANAPPARLSATAESAAALLAS 
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310 320 330 340 350 360 

ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 
III I I I I I I I I I I I I I I I I I II I M I I I I I I II I I I I : I I I II I I I I I II II I II I II 
ALCLTGIFSPLASLLLPENYAAVRFIVVSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 
310 320 330 340 350 360 

370 380 390 400 410 419 

LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 

LGALAANLLLLGL--AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
370 380 390 400 410 

420 430 440 450 450 470 

orf 10 .pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 

orflOa LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOa nucleotide sequence <SEQ ID 377> is: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCTGCCG 

101 ACGACATCGG ACGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CGCCGCCGAC AAAGACACTT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC ATCCCTGCCG 

301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGTCCAAG 

451 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CGCGCACCGT TTTCATCCGC CGTCCTGCAT CGCGGCCTGC GCTACGGCAT 

651 ACCGATCGCA CTAAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTAG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG AGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGCA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCTC 

1001 CGCTGTTTTG CACGCTGGTA GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGAAAAACAC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCGGG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTTTGGCTGT TTTTTGTTTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

1251 CACATTGTTC TGCCTGGCCT CCTCGGCGGC CTACACCTGC TTCGGCACTC 

1301 CGGCAAACTA CCCCCTGTTT GCCGGCGTAT GGGCGGTATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

14 01 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence <SEQ ID 378>; 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVSK 

151 LAILLLLPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSSAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEA NAPPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFIVVSC MLPPLFCTLV EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLAVPSGGA RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAVYLAG 

4 51 CILRHRKDLH KLFHYLKKQG FPL* 

ORFlOa and ORFlO-1 show 95.4% identity in 475 aa overlap: 

10 20 30 40 50 60 

orf 10-1. pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

orf 10a MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
10 20 30 40 50 60 



orflO.pep 
orflOa 

orflO.pep 
orflOa 
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YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 



LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 



NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 
I I I I I I I I I I I I I I I M I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 



AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 



orf 10-1. pep 

orflOa 



ALCXTGIFSPLASLLLPENYAAVRFIVVSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 
ALCLTGIFSPLASLLLPENYAAVRFIVVSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 



orflO-l.pep 
orflOa 



LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
LGALAANLLLLGL— AVP3GGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 



orf 10-1 .pep 
orflOa 



LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 



Homology with a predicted ORF from N.sonorrhoeae 

ORFIO shows 94.1% identity over a 475aa overlap with a predicted ORF (ORFlO.ng) from A': 
gonorrhoeae: 

orf 1 Ong . pep MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 60 
or f 1 Onm MDTKE ILXYAAGS I GS AVLAVI ILPLLSWYFPADD I GRI VLMQTAAGLTVSVLCLGLDQA 6 0 

orflOng.pep YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLL3RPSLPSEILFSLDDAAAGIGLVLFE 120 



)rf lOnm 
irf lOng . p< 



I ! 



I I I 



YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 180 

I 1 I I I I i I i I I 1 I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 1 I I I I I I I : I I I I I I I I I 

orflOnm LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 180 

orflOng.pep NLAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPLALSSLAYWGLASADRLFLKKY 240 

I I I I I I I I I ! I I I I I I I I I I : I I I I I I I I I 1 I I I I ! I : I I I I : I I I I I I I I I I I I I I I I 

orflOnm NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 240 

orf lOng .pep AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 300 

orflOnm AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 300 

orflOng.pep ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNVVRKTRPIALAT 360 

orflOnm ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 360 
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370 380 390 400 410 

orf lOng . pep LGALAANLLLLGL— AVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
I I i I I I I I ! I I ! I Ml: ! I I I I I I I I I I I I I : I I I M I I I I I I I I I I I I I I : I I 
orflOnm LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 

370 380 390 400 410 

420 430 440 450 460 470 

orf lOng . pep LFCLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

orflOnm LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOng nucleotide sequence <SEQ ID 379> is: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCcccgCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG ACTGACGGTG 

151 TCGGTATTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CGCCGCCGAC AAAGACACTT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTTTTCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG GCGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAA 

451 CTCGCCATTC TGCTGCTGTT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC TCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CGCGCGCCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGCTCGCA CTGAGCAGCC TTGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCGGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGCTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGTGC AATCGAAGAA AACGCCACGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGAAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTACCGT CGTATCGTGT ATGCTGccgc 

1001 cgctGTTTTA CACGCTGACC GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GTCCGATCGC GCTTGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCACG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTCTGGTTGT TTTTTGTTTT CAAGACAGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

1251 CACATTGTTC TGCCTgGCCT CCTCGGCGGC CTACACCTGC TTCGGCACAC 

1301 CGGCAAACTA CCCcctgttt gccggcgtAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AAATTTGCAC AAACTGTTTC ATTATTTGAA 

1401 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence <SEQ ID 380>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTL FL PPLLFSAAIA ALLL SRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 lAIL LLLPLT VGLLHFPANT SVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSPAVLH RGLRYGIPLA LSSLAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LLQSIFSTVW TPYIFRAIEE NATPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFTWSC MLPPLFYTLT EISGIGLNW 

351 RKTRPI ALAT LGALAAMLLL LGLA VPSGGT RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKNLH KLFHYLKKQG FPL* 

ORFlOng and ORFlO-1 show 96.4% identity in 473 aa overlap: 

10 20 30 40 50 60 

orf 10-1. pep MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I 1 I I I i I I I I I I I I I I I I I I I I 
orflOng-1 MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 10-1. pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

orflOng-1 YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRP3LPSEILFSLDDAAAGIGLVLFE 
70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 10-1 . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

orflOng-1 LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10-1. pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 
!! I I I I I I I I I M I I I I I I I : I I I I I I I I I I I I I I I I I : I I I I : I I I I I I I I I I I I I I I I 
orflOng-1 NLAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPLALSSLAYWGLASADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10-1 . pep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTFYIFRAIEENAPPARLSATAESAAALLAS 

orflOng-1 AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 
250 260 270 280 290 300 



310 320 330 340 350 360 

orf 10-1. pep ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLAEISGIGLNVVRKTRPIALAT 

orflOng-1 ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 
310 320 330 340 350 360 



orf 10-1 . pep LGALAANLLLLGLAVPSGGARGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHTLF 
orflOng-1 LGALAANLLLLGLAVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHTLF 



irf 10-1 . pep CLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
irflOng-l CLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 



Based on this analysis, including the presence of a putative leader peptide and several 
transmembrane segments and the presence of a leucine-zipper motif (4 Leu residues spaced by 6 
aa, shown in bold), it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 45 



The following partial DNA sequence was identified in N. meningitidis <SEQ ED 381>: 



1. .ATCCTGAAAC CGCATAACCA GCTTAAGGAA GACATCCAAC CTGATCCGGC 

51 CGATCAAAAC GCCTTGTCCG AACCGGATGC TGCGACAGAG GCAGAGCAGT 

101 CGGATGCGGA AAATGCTGCC GACAAGCAGC CCGTTGCCGA TAAAGCCGAC 

151 GAGGTTGAAG AAAAGGCGGG CGAGCCGGAA CGGGAAGAGC CGGACGGACA 

201 GGCAGTGCGT AAGAAAGCGC TGACGGAAGA GCGTGAACAA ACCGTCAGGG 

251 AAAAAGCGCA GAAGAAAGAT GCCGAAACGG TTAAAATACA AGCGGTAAAA 

301 CCGTCTAAAG AAACAGAGAA AAAAGCTTCA AAAGAAGAGA AAAAGGCGGG 

351 GAAGGAAAAA GTTGCACCCA AACCAACCCC GGAACAAATC CTCAACAGCG 

401 GCAgCATCGA AAAmGCGCGC AgTGCCGCCG CCAAAGAAGT GCAGAAAATG 

451 AA.AACGTCC GACAAGGCGG AAGC.AACGC ATTATCTGCA AATGGGCGCG 

501 TATGCCGACC GTCAGAGCGC GGAAGGGCAG CGTGCCAAAC TGGCAATCTT 

551 GGGCATATCT TCCAAGGTGG TCGGTTATCA GGCGGGACAT AAAACGCTTT 

601 ACCGGGTGCA AAGCGGCAAT ATGTCTGCCG ATGCGGTGA 

This corresponds to the amino acid sequence <SEQ ID 382; ORF65>: 



1 . . ILKPHNQLKE DIQPDPADQN ALSEPDAATE AEQSDAENAA DKQPVADKAD 
51 EVEEKAGEPE REEPDGQAVR KKALTEEREQ TVREKAQKKD AETVKIQAVK 
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101 PSKETEKKAS KEEKKAAKEK VAPKPTPEQI LNSGSIEXAR SAAAKEVQKM 
151 XNVRQGGSXR IICKWARMPT VRARKGSVPN WQSWAYLPRW SVIRRDIKRF 
201 TGCKAAICLP MR* 

Further work revealed the complete nucleotide sequence <SEQ ID 383>: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGCTTC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 CAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGATAAAG CCGACGAGGT TGAAGAAAAG GCGGGCGAGC CGGAACGGGA 

351 AGAGCCGGAC GGACAGGCAG TGCGTAAGAA AGCGCTGACG GAAGAGCGTG 

4 01 AACAAACCGT CAGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

451 AAACAAGCGG TAAAACCGTC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGCGAAGG AAAAAGTTGC ACCCAAACCA ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCCGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GTCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGTCAGAG CGCGGAAGGG CAGCGTGCCA 

701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 384; ORF65-l>: 

1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPASSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAATEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSGS lEKARSAAAK 

201 EVQKMKTSDK AEATHYLQMG AYADRQSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain Al 

ORF65 shows 92.0% identity over a 150aa overlap with an ORP (ORF65a) from sfrain A oiN. 
meningitidis: 



ILKPHNQLKEDIQPDPADQNALSEPDAATE 
IIAGILF YLNQSGQNAFKIPVPSKQPAETEILKPKNQPKEDIQPEPADQNALSEPDAAKE 



AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
AEQSDAEKAADKQPVADKADEVEEKADEPEREKSDGQAVRKKALTEEREQTVGEKAQKKD 



AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
AETVKKQAVKPSKETEKKASKEEKKAEKEJWAPKPTPEQILNSGSIEKARSAAAKEVQKM 



XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 
KTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGI3SKWGYQAGHKTLYRVQSGNMSAD 



The complete length ORF65a nucleotide sequence <SEQ ID 385> is: 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 
51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 
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TGAACCAGAG 
CCTGCAGAAA 
CCAACCTGAA 
AAGAGGCAGA 
GCCGACAAAG 
AAAGTCGGAC 
AACAAACCGT 
AAACAAGCGG 
AGAGAAAAAG 
AAATCCTCAA 
GAAGTGCAGA 
GCAAATGGGC 
AACTGGCAAT 
CATAAAACGC 
GAAAAAAATG 
GTTCTATCGA 



CGGTCAAAAT 
CGGAAATCCT 
CCGGCCGATC 
GCAGTCGGAT 
CCGACGAGGT 
GGACAGGCAG 
CGGGGAAAAA 
TAAAACCATC 
GCGGAGAAGG 
CAGCGGCAGC 
AAATGAAAAC 
GCGTATGCCG 
CTTGGGCATA 
TTTACCGGGT 
CAGGACGAGT 
AAGCAAATAA 



GCGTTCAAAA 
GAAACCGAAA 
AAAACGCCTT 
GCGGAAAAAG 
TGAGGAAAAG 
TGCGCAAGAA 
GCGCAGAAGA 
TAAAGAAACA 
AAAAAGTTGC 
ATCGAAAAAG 
GCCCGACAAG 
ACCGCCGGAG 
TCTTCCAAGG 
GCAAAGCGGC 
TGAAAAAACA 



TCCCGGTTCC 
AACCAGCCTA 
GTCCGAACCG 
CTGCCGACAA 
GCGGACGAGC 
AGCACTGACG 
AAGATGCCGA 
GAGAAAAAAG 
ACCCAAACCG 
CGCGCAGTGC 
GCGGAAGCAA 
CGCGGAAGGG 
TGGTCGGTTA 
AATATGTCTG 
TGAAGTCGCC 



GTCGAAGCAG 
AGGAAGACAT 
GATGCTGCGA 
GCAGCCCGTT 
CGGAGCGGGA 
GAAGAGCGTG 
AACGGTTAAA 
CTTCAAAAGA 
ACCCCGGAAC 
CGCTGCCAAA 
CGCATTATCT 
CAGCGTGCCA 
TCAGGCGGGA 
CCGATGCGGT 
AGCCTGATCC 



This encodes a protein having amino acid sequence <SEQ ID 386>: 

1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPVPSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK ADEPEREKSD GQAVRKKALT EEREQTVGEK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AEKEKVAPKP TPEQILNSGS lEKARSAAAK 

201 EVQKMKTPDK AEATHYLQMG AYADRRSAEG QRAKLAILGI SSKVVGYQAG 

251 HKTLYRVQSG NMSADAVKECM QDELKKHEVA SLIRSIESK* 

ORF65a and ORF65-1 show 96.5% identity in 289 aa overlap: 



orf 65a.pep MFMNKFSQSGKGL3GFFFGLILATVIIAGILFYLNQSGQNAFKIPVPSKQPAETEILKPK 
orf 65-1 MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 



NQPKEDIQPEPADQNALSEPDAAKEAEQSDAEKAADKQPVADKADEVEEKADEPEREKSD 
NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 



ir f 65a . pep GQAVRKKALTEEREQTVGEKAQKKDAETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKP 
.rf 55-1 GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 



irf 65a . pep TPEQILNSGSIEKARSAAAKEVQKMKTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGI 
irf65-l TPEQI LN SGS I EKARS AAAKEVQKMKT S DKAEATHYLQMGAYADRQS AEGQRAKLAI LG I 



orf 65a. pep 
orf65-l 



SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 



Homologv with a predicted ORF from N.sonorrhoeae 

ORF65 shows 89.6% identity over a 212aa overlap with a predicted ORF (ORF65.ng) from A^. 
gonorrhoeae: 



ORF65ng IIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLKNQPKEDIQPEPADQNALSEPDVAKE 
ORF65 ILKPHNQLKEDIQPDPADQNALSEPDAATE 
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ORF65ng AEQSDAEKAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
ORF65 AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 



ORF65ng AETVKKKAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSRSIEKARSAAAKEVQKM 
ORF65 AETVKIQAVKPSKETEKKASKEEKfCAAKEEWAPKPTPEQILNSGSIEXARSAAAKEVQPCM 



ORF55ng KNFGQGGSQRIICKWARMPNPGARKGSVPNWQSWAYLPKWSAIRRDIKRFTACKAAICPP 
15 I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I : I I I I I I I I I : I I I I I I I 

ORF65 XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 
160 170 180 190 200 210 

ORF65ng MR 

20 II 

ORF65 MR 

An ORF65ng nucleotide sequence <SEQ ID 387> was predicted to encode a protein having amino 
acid sequence <SEQ ID 388>: 

1 MFMNKFSQSG K GLSGFFFGL ILATVIIAGI LLYLNQGGQN AFKIPAPSKQ 

25 51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KKAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS lEKARSAAAK 

201 EVQKMKNFGQ GG3QRIICKW ARMPNPGARK GSVPNWQSWA YLPKWSAIRR 

251 DIKRFTACKA AICPPMR* 

30 After further analysis, the complete gonococcal DNA sequence <SEQ ID 389> was found to be: 

1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTCTT 

51 CTTCGGTTTG ATACTGGCAA CGGTCATTAT TGCCGGTATT TTGCTTTATC 

101 TGAACCAGGG CGGTCAAAAT GCGTTCAAAA TCCCGGCTCC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACTGAAA AACCAGCCTA AGGAAGACAT 

35 201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGTTGCGA 

251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGACAAag ccgaogAGGT TGAAGAAAag GcGGgcgAgc cggaACGGga 

351 aGAGCCGGAC ggACAGGCAG TGCGCAAGAA AGCACTGAcg gAAGAgcGTG 

4 01 AACAAACcgt cagggAAAAA GCGCagaaga AAGATGCCGA AACGgTTAAA 

40 4 51 AAacaaGCgg tAaaaccgtc tAAAGAAACa gagaaaaaag cTtcaaaaga 

501 agagaaaaag gcggcgaaag aaaAAGttgc acccaaaccg accccggaaC 

551 aaatcctcaa cagccgCagc atcgaaaaag cgcgtagtgc cgctgccaaa 

601 gaAgtgcaGA AAatgaaaaa ctTtgggcaa ggcgGaagcc aacgcattaT 

651 CTGcaaatgg gcgcgtatgc cgaccgtccg gagcgcggaA gggcagcgtg 

45 701 ccaaACtggc aAtcttgGgc atatctTccg aagtggtcgG CTATCAGGCG 

751 GGACATAAAA CGCTTTACCG CGTGCAAagc GGCAatatgt ccgccgatgc 

801 gGTGAAAAAA ATGCAGGACG AGTTGAAAAA GCATGGGGtt gcCAGCCTGA 

851 TCCGTGcgAT TGAAGGCAAA TAA 

This encodes the following amino acid sequence <SEQ ID 390>: 

50 1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LL YLNQGGQN AFKIPAPSKQ 

51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS lEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPTVRSAE GQRAKLAILG ISSEWGYQA 

55 251 GHKTLYRVQS GNMSADAVKK MQDELKKHGV ASLIRAIEGK * 

ORF65ng-l and ORF65-1 show 89.0% identity in 290 aa overlap: 

10 20 30 40 50 60 

orf 65-1 . pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 

60 orf65ng-l MFMNKFSQSGKGLSGFFFGLIIATVIIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLK 
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>rf 65-1 . pep NQPKEDIQPEPADQNALSEPDAATEAEQSDAEfCRADKQPVADKADEVEEKAGEPEREEPD 
irf65ng-l NQPKEDIQPEPADQNALSEPDVAKEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 



>rf 65-1 . pep GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 
)rf65ng-l GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAA^ 



TPEQILNSGSIEKARSAAAKEVQKMKTSDKAEATHYL-QMGAYADRQSAEGQRAKLAILG 



TPEQILNSRSIEKARSAAAKEVQKMKNFGQGGSQRIICKWARMPTVRSAEGQRAKLAILG 



orf 65-1 . pep ISSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
orf65ng-l ISSEWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHGVASLIRAIEGKX 



On this basis, including the presence of a putative transmembrane domain in the gonococcal 
protein, it is predicted that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 46 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
391>: 



1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTkTCTTCGG 

51 CGGAAcGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GcGTTTGs . s 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAAtC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGAcCAaAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAaATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT tGCGgTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AgCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTtTAG 

551 CAATCGGCAT TTTtTCCCTG CAACTGAAwA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 392; ORF103>: 

1 MNHDITFLTL FLLGXFGGTH CIGMCGGLSS AFXXQLPPHI NRFWLILLLN 

51 TGRVSSYTAI GLILGLIGQV GVSLDQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIFSL QLXKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Further work elaborated the DNA sequence <SEQ ID 393> as: 



ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 
CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 
TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 
ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 
CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCCTG CAGAATATTT 
TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 
GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 
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351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTAG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 394; ORF103-1>: 



1 MMHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRFWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVL QMILYTAAN L LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL^ 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain Al 

ORF103 shows 93.8% identity over a 222aa overlap with an ORF (ORF103a) from strain A of A'. 

meningitidis: 

10 20 30 40 50 60 

orfl03.pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 
! I I I I I I I I I I I I I I I ! i I I I I I i I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl03a MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 



orfl03.pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

I I j I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I 
orfl03a GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 



130 140 150 160 170 180 

orf 103 . pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

orfl03a NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
130 140 150 160 170 180 



190 200 210 220 

orf 103 . pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II II II II II II I II II II II II II II II II I I M II I II II 
orf 103a NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
190 200 210 220 

The complete length ORF103a nucleotide sequence <SEQ ID 395> is: 



1 ATGAACCANG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTNT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCNTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTA 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTNGG 

551 CAATCGGCAT TTTTTCCCTG CAACTGNAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This encodes a protein having amino acid sequence <SEQ ID 396>: 

1 MNXDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRXWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVX QNILYTAAN L LLLFLGLYLS 

101 GISSLA AKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 
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151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLXAIGIF SL QLXKIMQNRY 
201 IRLCTGLSVS LWALWKLAVL WL* 

ORFlOSa and ORF103-1 show 97.7% identity in 222 aa overlap: 

10 20 30 40 50 60 

MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I II I I II II I I II 
MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 
10 20 30 40 50 60 

70 80 90 100 110 120 

GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
70 80 90 100 110 120 



orflC3a.pep 
orfl03-l 

orf 103a. pep 
orfl03-l 



NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 



NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II 
NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 



Homology with a predicted ORF from N.gonorrhoeae 

ORF103 shows 95.5% identity over a 222aa overlap with a predicted ORF (ORF103.ng) from A^. 
gonorrhoeae: 

orf 103 . pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 60 
orfl03ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 60 

orf 103 . pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 
orfl03ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

orf 103 . pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 180 
orfl03ng NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 180 

orf 103. pep NLLAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 
orfl03ng NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWL 222 

The complete length ORFlOSng nucleotide sequence <SEQ ID 397> is: 



1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTGCTCG GTTTCTTCGG 

51 CGGAACTCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATTCT GCTGCTTAAC 

151 ACAGGACGGA TAAGCAGCTA TACGGCAATC GGCCTGATGC TCGGATTAAT 

201 CGGACAACTC GGCATTTCAC TCGACCAAAc ccgcgTCCTG CAAAATATTT 

251 tatacacagc ctccaaCCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGCAACCTG AACCCGATAC TCAACCGGCT GCTGCCCATA AAATCCATAC 

401 CCGCCTGCCT TGCTGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CATCACTTTA CGCGCTGGGA AGCGGTAGTG CGACAACCGG 

501 CGGACTGTAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTGG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACAGGATT ATCCGTATCA TTATGGGCAT TATGGAAGCT 

651 TGCCGTCCTG TGGCTGTAA 

This encodes a protein having amino acid sequence <SEQ ID 398>: 
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1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LOLPPHI NRFWLILLLN 

51 TGRISSY TAI GLMLGLIGQL GISL DQTRVL QNILYTASN L LLLFLGLYLS 

101 GISSLA AKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSATTGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

In addition, ORF103ng and ORF103-1 show 97.3% identity in 222 aa overlap: 



MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 
MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 



)rfl03-l.i 
irflOSng 



GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWR^ 



>rfl03-l.i 
)rfl03ng 



NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I 
NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 



orfl03-l.pep 
orflOSng 



NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 47 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 399>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTT CGCTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGAT.TCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCcGAAGC GGCGaGGATT 

201 TTTCTTGGTG CTCATTCAGG CTGCTGCTGC TCGGCGTGGC GGGCATTTCG 

251 GCAAACTTTG TGCTGATTGC CCAAGGGCTG CATTATATTT CGCCGACCAC 

301 GACGCAGGTT TTGTGGCAGA TTTCGCCGTT TACGATGATT GTwGTCGGTG 

351 TGTTGGTGTT TAAAGACCGG ATGACTGCCG CTCAGAAAAT CGGCTTGGTT 

401 TTGCTGCTTG CCGGTTTGCT TATGTATTTT AACGATAAAT TCGGCGAGTT 

451 GTCGGGTTTG GGCGCGTATG C.AAGGGCGT GTTGCTGTGT GCGGCAGGCA 

501 GTATGGCATG GGTGTGTAAT GCCGTGGCGC AAAAGCTGCT GTCGGCGCAA 

551 TTCGGGCCGC AACAGATTCT GCTGTTGATT TATGCGGCAA GTGCCGCCGT 

601 GTTCCTGCCG TTTGCCGAAC CGGCACACAT CGGAAGTATG GACGGTACGT 

651 TGGCGTGGGT ATGTATTGCG TATTGCTGCT TGAATACGTT AATCGGTTAC 

701 GGCTCGTTCG GCGAGGCGTT GAAACATTGG GAGGCTTCCA AAGTCAGCGC 

751 GGTAACAACC TTGCTCCCCG TGTTTACCGT AATAAATACT TTGCTCGGGC 

801 ATTATGTGAT GCCTGAAACT TTTGCCGCGC CGGA. . 

This corresponds to the amino acid sequence <SEQ ID 400; ORF104>: 

1 MENQRPLLGF RLALLAAMTW GTLPXSVRQV LKFVDAPTLV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWC SFRLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MYFNDKFGEL 

151 SGLGAYXKGV LLCAAGSMAW VCNAVAQKLL SAQFGPQQIL LLIYAASAAV 

201 FLPFAEPAHI GSMDGTLAWV CIAYCCLNTL IGYGSFGEAL KHWEASJCVSA 
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251 VTTLLPVFTV INTLLGHYVM PETFAAP . . . 

Further work revealed further partial DNA sequence <SEQ ID 40 1>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACC GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATAwTwwCTT TGCTCGGGCA 

801 TTATGTGATG CCTGAAACTT TTGCCGCGCC GGA. . . 

This corresponds to the amino acid sequence <SEQ ID 402; ORF104-1>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV MVRFTVAAAV 

51 LFVLLA LGGR LPKRRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMTA AQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFAE PAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IXXL LGHYVM PETFAAP. . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical HI0878 protein of H. influenzae (accession number 1132769) 

ORF104 and HI0878 show 40% aa identity in 277aa overlap: 

orfl04 4 QRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 62 

Q+PLLGF AL+ AM WG+LP +++QVL ++A T+VW P 
HI0878 3 QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

orfl04 63 — KRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +8 F M++ GVL+F 
HI0878 63 LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 

orfl04 121 KDRMTAAQKIXXXXXXXXXXMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 

K+++ QKI ++FND+F +GL Y GV+L G++ WV +AQKL+ 

HI0878 119 KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYGMAQKLM 178 

orfl04 181 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 240 

+F QQILL++Y A F+P A+ + + + LA +C YCCLNTLIGYGS+ EAL 
HI0878 179 LRKFNSQQILLMMYLGCAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 237 

orfl04 241 KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 

W+ SKVS V TL+P+FT++ + + HY P FAAP 
HI0878 238 NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAP 274 

Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF104 shows 95.3% identity over a 277aa overlap with an ORF (ORF104a) from sfrain A of A^. 
meningitidis: 

10 20 30 40 50 60 

orf 104 .pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
or f 1 0 4a MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 



70 



90 100 110 120 
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orfl04.pep LPKRRDFSWCS FRLLLLGVAGI SANFVLIAQGLHYI S PTTTQVLWQI S PFTMIWGVLVF 
orfl04a LPKWRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 



KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 
KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 



irfl04 .pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 
irf 104a SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 



irf 104 . pep KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 



The complete length ORF104a nucleotide sequence <SEQ ID 403> is: 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTCTTGGTGC 
CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCTTGC 
TCGGGTTTGG 
TATGGCATGG 
TCGGGCCGCA 
TTCCTGCCGT 
GGCGTGGGTT 
GCTCGTTCGG 
GTAACAACCT 
TTATGTGATG 
ATGCCGGCGC 
GACAGGCTGT 



AAAGGCCGCT 
GGAACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTTTGCTT 
GCGCGTATGC 
GTGTGTTATG 
ACAGATTCTG 
TTGCCGAACT 
TGTTTTGCGT 
CGAGGCGTTG 
TGCTCCCCGT 
CCTGATACTT 
ACTGGTCGTG 
TCAAACGCCG 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
TGACTGCCGC 
ATGTTTTTTA 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 
AAACATTGGG 
GTTTACCGTA 
TTGCCGCGCC 
GTCGGGGGTG 
CTAG 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGT 
CGGCGTGGCG 
ATTATATTTC 
ACGATGATTG 
TCAGAAAATC 
ACGATAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGCAAG 
GGAAGTTTGG 
GAATACGTTA 
AGGCTTCCAA 
ATATTTTCTT 
GGATATGAAC 
CGGTTACGGC 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGTGT 
GGCTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
TGCCGCCGTG 
ACGGTACGTT 
ATCGGTTACG 
AGTCAGCGCG 
TGCTCGGGCA 
GGTTTGGGTT 
GGCGGTGGGG 



This encodes a protein having amino acid sequence <SEQ ID 404>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV VfVRFTVAAAV 

51 LFVLL ALGGR LPKWRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAA V 

201 FLPFAELAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYAGALW VGGAVTAAV G 

301 DRLFKRR* 

ORF104a and ORF104-1 show 98.2% identity in 277 aa overlap: 



MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
MENQRPLLGFAUiLLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTviJJlVLBVLIAL^ 



LPKWRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 



KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
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KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 



orf 104a. pep SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
orf 104-1 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 



KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALVWGGAVTAAVG 
KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 



Homology with a predicted ORF from N.sonorrhoeae 

ORF104 shows 93.9% identity over a 277aa overlap with a predicted ORF (ORF104.ng) from A^. 
gonorrhoeae: 

MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 
MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQI3PFTMIWGVLVF 120 

LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 



orfl04 .pep 
orfl04ng 
orf 104 .pep 
orfl04ng 
orfl04 .pep 
orf 104ng 
orf 104 .pep 
orf 104ng 
orf 104 .pep 
orfl04ng 



KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 
KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 



SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYG3FGEAL 240 

SAQFGPQQILLLIYAASAAVFLLXAEPAHIGSLDGTLAWVCFVYCCLNTLIGYG3FGEAL 240 

KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I : I I I I I 

KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 300 

The complete length ORF104ng nucleotide sequence <SEQ ID 405> is predicted to encode a 
protein having amino acid sequence <SEQ ID 406>: 

1 MENORPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMTA AQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQ QIL LLIYAASAAV 

201 FLLXA EPAHI GSL DGTLAWV CFVYCCLNTL IGYGSFGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN G LGYVGALW VGGAVTAA VG 

301 DRPFKRR* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 407>: 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTCTTGGCAT 
CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCttgT 
TCGGGTTTGG 
TATGGCCTGG 
TCGGGCCGCA 
TTCCtgccgT 
GGCGTGGGTT 



AAAGGCCGCT 
GGGACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTttgCTT 
GCGCGTATGC 
GTGTGTTATG 
ACAGATTCTG 
TTGccgaaCC 
TGTTTTGTGT 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
tgaCTGCCGC 
ATGTTTTtta 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGC 
CGGCGTGACG 
ATTATATTTC 
ACGATGATTG 
GCAGAAAATC 
ACGACAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGcaag 
GGAAGTTTgg 
GAATACGTTA 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGCGT 
GGTTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
tgccgccGTG 
aCGGTACGtt 
ATCGGTTACG 
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7 01 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

7 51 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 

851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCCGT TCAAACGCCG CTAG 

This corresponds to the amino acid sequence <SEQ ID 408; ORF104ng-l>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMTA AQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFA EPAHI GSLD GTLAWV CFVYCCLNTL I GYGSFGEAL KHWEASK VSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYVGALW VGGAVTAAV G 

301 DRPFKRR* 

ORF104ng-l and ORF104-1 show 97.5% identity in 277 aa overlap: 



irf 104-1 . pep MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
>rfl04ng-l MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 



. . pep LPKRRDFSWCSFRLLLLGVAGI3ANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
f-1 LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 



orf 104-1. pep 
orfl04ng-l 



KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 



orf 104-1. pep 
orfl04ng-l 



SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 



orfl04-l.i 
orf 104ng-: 



KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 

KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALWVGGAVTAAVG 



In addition, ORF104ng-l shows significant homology with a hypothetical H.influenzae protein: 

gi 11573895 (D32769) hypothetical [Haemophilus influenzae] Length = 306 
Score = 237 bits (598), Expect = 8e-62 

i = 114/280 (40%), Positives = 168/280 (59%), Gaps = 8/280 (2%) 

QRPXXXXXXXXXXXMTWGTLPIAVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 8 8 
Q+P M WG+LPIA++QVL ++A T+VW P 

QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

— KRRDF3WHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 14 6 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 
LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 



+GL Y+ GV+L G++ WV Y +AQKL+ 



Ident; 






30 


Sbjct: 


3 




89 


Sbjct: 


63 


Query : 


147 


Sbjct: 


119 




207 


Sbjct: 


179 


Query: 


267 



LA +CF+YCCLNTLIGYGS+ EAL 



267 KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMN 306 
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W+ SKVS V TL+P+FT++FS + HY P FAAP++N 
Sbjct: 238 NRWDVSKVSVVITLVPLFTILFSHIAHYFSPADFAAPELN 277 

Based on this analysis, including the presence of a putative leader sequence and several putative 
transmembrane domains in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 48 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 409>: 

1 ATGGTAGCTC GTCGGGCTCA TAACCCGAAG GTCGTAGGTT CGAATCCTGT 

51 .CCCGCAACC TAATTTCAAA CCCCTCGGTT CAATGCCGAG GG.GTTTTGT 

101 T.TTGCCTGT TTCCTGTTTC CTGTTTCCTG CCGCCTCCGT TTTTTGCCGG 

151 ATTTTCCTTC CGGCCGCAAT ATCGGAACGG CAGACCGCCG TCTGTTTGCG 

201 GTTGCAAATT CAGGCAGTTT GGCTACAATC TTCCGCATTG TCTTCAAGAA 

251 AGCCAACCAT GCCGACCGTC CGTTTTACCG AATCCGTCAG CAAACAAGAC 

301 CTTGATGCTC TGTTCGAGTG GGCAAAAGCA AGTTACGGTG CAGAAAGTTG 

351 CTGGAAAACG CTGTATCTGA ACGGTCysCC TTTGGGCAAC CTGTCGCCGG 

4 01 AATGGGTGGA ACGCGTsmmA AAAGACTGGG AGGCAGGCTG CyCGGAGTCT 

4 51 TCAGACGGCA TTTTTCTGAA TgCGGACGGc TGgCctGATA TGGgCGGAcg 

501 cTTACAGCAC CTCGCCCTCG GTTGGCACTG TGCGGGGCTG TTGGACGgsT 

551 GGCGCAACGA GTGTTTCGAC CTGACCGACG GCGGCGGCAA CCCCTTGTTC 

601 ACGCTCGaAc GCGCCGyTTT itlCGTCCTkTC GGACTGCTCA GCCGCGCCGT 

651 CCATCTCAAC GGTCTGACCG AATCGGACGG CCGATGGCAT TTCTGGATAG 

701 GCAGGCGCAG TCCGCACAAA GCAGTCGATC CCAACAAACT CGACAATACT 

751 rCCGCCGGCG GTGTTTCCGG CGGCGAAATG CCGTCTGAAG CCGTGTGTCG 

801 CGAAAGCAGC GAAGAAGCCG GTTTGGATAA AACGCTGcTT CCGCTCATCC 

851 GCCCGGTATC GCAGCTGCAC AGCCTGCGCT CCGTCAGCCG GGGTGTACAC 

901 AATGAAATCC TGTATGTATT CGATGCCGTC CTGCCG... 

This corresponds to the amino acid sequence <SEQ ID 410; ORF105>: 

1 MVARRAHNPK VVGSNPXPAT XFQTPRFNAE XVLXLPVSCF LFPAASVFCR 

51 IFLPAAISER QTAVCLRLQI QAVWLQSSAL SSRKPTMPrV RFTESVSKQD 

101 LDALFEWAKA SYGAESCWKT LYLNGXPLGN LSPEWVERVX KDWEAGCXES 

151 SDGIFLNADG WPDMGGRLQH LALGWHCAGL LDGWRNECFD LTDGGGNPLF 

201 TLERAXXRPX GLLSRAVHLN GLTESDGRWH FWIGRRSPHK AVDPNKLDNT 

251 XAGGVSGGEM PSEAVCRESS EEAGLDKTLL PLIRPVSQLH SLRSVSRGVH 

301 NEILYVFDAV LP. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 41 1>: 

1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 TCTGTTCGAG TGGGCAAAAG CAAGTTACGG TGCAGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ACCTGTCGCC GGAATGGGTG 

151 GAACGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCTGA TATGGGCGGA CGCTTACAGC 

251 ACCTCGCCCT CGGTTGGCAC TGTGCGGGGC TGTTGGACGG CTGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCTT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

401 ACGGTCTGAC CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

451 AGTCCGCACA AAGCAGTCGA TCCCAACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CTCCGTCAGC CGGGGTGTAC ACAATGAAAT 

651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

751 GATGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 



This corresponds to the amino acid sequence <SEQ ID 412; ORF105-1>: 



wo 99/24578 



-255- 



PCT/IB98/01665 



1 MPTVRFTESV SKQDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWV 

51 ERVKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLALGWH CAGLLDGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLTESD GRWHFWIGRR 

151 SPHKAVDPNK LDNTAAGGVS GGEMPSEAVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRSV3 RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N.meninsitidis (strain A) 

ORF105 shows 89.4% identity over a 226aa overlap with an ORF (ORFlOSa) from strain A of A^. 
meningitidis: 

60 70 80 90 100 110 

or f 105. pep ISERQTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAES 

or f 105a MPTVRFTESVSKHDLDALFEWAKASYGAES 

10 20 30 

120 130 140 150 160 170 

orflOS.pep CWKTLYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWH 

orfl05a CWKTLYLNGLPLGNLSPEWAERVKKDWEAGCSESSDGIFLNADGWPDMGRRLQHLARIWK 
40 50 50 70 80 90 

180 190 200 210 220 230 

orfl05.pep CAGLLDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRR 

I I I I I I I : I I I I I I I I I : I I I I : I I I I II I I I I I I I I I I II : I I 

orfl05a EAGLLHGWRDECFDLTDGGSNPLFALERAAFRPFGLLSRAVHLNGLVESDGRWHFWIGRR 
100 110 120 130 140 150 

240 250 260 270 280 290 

or f 105. pep SPHKAVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVS 

or f 105a SPHKAVDPDKLDNTAAGGVSSGELPSETVCRESSEEAGLDKTLLPLIRPVSQLHSLRPVS 
160 170 180 190 200 210 

300 310 
orfl05.pep RGVHNEILYVFDAVLP 
M I I M I I I II I I I I I 

orfl05a RGVHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLAAMLSGNMMHDAQLVTLDAF 
220 230 240 250 260 270 

The complete length ORFlOSa nucleotide sequence <SEQ ID 413> is: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACACG ACCTTGATGC 

51 CCTATTCGAG TGGGCAAAGG CAAGTTACGG TGCGGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ATCTGTCGCC GGAATGGGCG 

151 GAGCGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

2 01 CATTTTCCTG AATGCGGACG GCTGGCCAGA TATGGGCAGA CGCTTGCAGC 

2 51 ACCTCGCCCG AATATGGAAA GAAGCGGGAC TGCTTCACGG CTGGCGCGAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCAGC AATCCCTTGT TCGCGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

4 01 ACGGTTTGGT CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGA TCCCGACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC AGCGGTGAAT TGCCGTCTGA AACCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CCCCGTCAGC CGGGGTGTGC ACAATGAAAT 

651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

7 01 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

7 51 GCTGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This encodes a protein having amino acid sequence <SEQ ID 414>: 

1 MPTVRFTESV SKHDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWA 

51 ERVKKDWEAG CSESSDGIFL NADGWPDMGR RLQHLARIWK EAGLLHGWRD 
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101 ECFDLTDGGS NPLFALERAA FRPFGLLSRA VHLNGLVESD GRWHFWIGRR 

151 SPHKAVDPDK LDNTAAGGVS SGELPSETVC RESSEEAGLD KTLLPLIRPV 

201 SQLH3LRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 AAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

5 ORFlOSa and ORF105-1 show 93.8% identity in 291 aa overlap: 



orf 105a. pep 
orfl05-l 



MPTVRFTESVSKHDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWAERVKKDWEAG 

I I I I I M I I I I I : I I I I I I i I ! I I I i I I I I I I I I I I I I I I I I 1 I i I I I I : I I I I I I I I I I 
MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 



orf 105a. pep 
orfl05-l 



CSESSDGIFLNADGWPDMGRRLQHLARIWKEAGLLHGWRDECFDLTDGGSNPLFALERAA 

I I I I I I I I I I I I I I I I ! I I I I I I i I j : MM M M M M M M h M M M M M 

CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 



orf 105a. pep 
orfl05-l 



FRPFGLLSRAVHLNGLVESDGRWHFWIGRRSPHKAVDPDKLDNTAAGGVSSGELPSETVC 
FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 



orf 105a . pep 
orfl05-l 



RESSEEAGLDKTLLPLIRPVSQLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 



orf 105a. pep 
orfl05-l 



FEKMDIGGLLAAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF105 shows 87.5% identity over a 312aa overlap with a predicted ORF (ORF105.ng) from N. 
gonorrhoeae: 

orf 105 . pep MVARRAHNPKWGSNPXPATXFQTPRFNAEXVLXLPVSCFLFPAASVFCRIFLPAAISER 60 

orflOSng MVARRAHNPKVVGSNPAPATKYQTPRFNAEGVLF FLFPAASVFCRIFLPAAISER 55 

orf 105 . pep QTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWSJCASYGAESCWKT 120 

orfl05ng QAAVCLRLQIQAVWLQSSALCSRKPAMPTVRFTESVSKQDLDALFERAJCASYGAESCWKT 115 

orf 105 . pep LYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWHCAGL 180 

orfl05ng LYLNRLPLGNLSPEWAERIKKDWEAGCSESSNGIFLNADGWPDMGGRLQHLARTWNKAGL 175 

orf 105. pep LDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRRSPHK 24 0 

I M I M M M M M M M M I M I M Ml M M M M M M M M M M M M M 

orfl05ng LHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLIRAVHLNGLVESNGRWHFWIGRRSPHK 235 

orf 105 . pep AVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVH 300 

orflOSng AVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVH 295 

orf 105. pep NEILYVFDAVLP 312 

orfl05ng NEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSBCNMMHDAQLVTLDAFYRYG 355 

A complete length ORF105ng nucleotide sequence <SEQ ID 415> was predicted to encode a 
protein having amino acid sequence <SEQ ID 416>: 
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1 MVARRAHNPK VVGSNPAPAT KYQTPRFNAE G VLFFLFPAA SVFCRIFL PA 

51 AISERQAAVC LRLQIQAVWL QSSALCSRKP AMPTVRFTES VSKQDLDALF 

101 ERAKASYGAE SCWKTLYLNR LPLGNLSPEW AERIKKDWEA GCSESSNGIF 

151 LNADGWPDMG GRLQHLARTW NKAGLLHGWR NECFDLTDGG GNPLFTLERA 

201 AFRPFGLLIR AVHLNGLVES NGRWHFWIGR RSPHKAVDPG KLDNIAGGGV 

251 SGGEMPSEAV CRESSEEAGL DKTLFPLIRP VSRLHSLRPV SRGVHNEILY 

301 VFDAVLPETF LPENQDGEVA GFEKMDIGGL LDAMLSKNMM HDAQLVTLDA 

351 FYRYGLIDAA HPLSEWLDGI RL-^ 

Further work revealed the complete nucleotide sequence <SEQ ID 417>: 



ATGCCGACCG 
CCTGTTCGAG 
CGCTGTATCT 
GAGCGCATCA 
CATTTTTCTG 
ACCTCGCCCG 
GAGTGTTTCG 
ACGCGCCGCT 
ACGGTTTGGT 
AGTCCGCACA 
CGGTGTTTCC 
GCGAAGAAGC 
TCGCGGCTGC 
CCTGTATGTG 
AGGATGGCGA 
GATGCCATGT 
GGACGCGTTT 
AGTGGCTGGA 



TCCGTTTTAC 
CGGGCAAAAG 
GAACCGTCTT 
AAAAAGACTG 
AATGCGGACG 
CACATGGAAC 
ACCTGACCGA 
TTCCGTCCGT 
CGAATCGAAC 
AAGCAGTCGa 
GGCGGCGAAA 
CGGTTTGGAT 
ACAGCCTTCG 
TTCGATGCCG 
GGTAGCGGGT 
TGTCGAAAAA 
TACCGTTACG 
CGGCATACGT 



CGAATCCGTC 
CAAGTTACGG 
CCTTTGGGCA 
GGAGGCAGGC 
GCTGGCCGGA 
AAGGCGGGGC 
CGGCGGCGGC 
TCGGACTACT 
GGCAGATGGC 
tcCCGGCAAG 
TGCCGTCTGA 
AAAACGCTGT 
CCCCGTCAGC 
TCCTGCCCGA 
TTTGAAAAGA 
CATGATGCAC 
GTCTGATTGA 
TTATAG 



AGCAAACAAG 
TGCCGAAAGT 
ATCTGTCGCC 
TGCTCCGAGT 
TATGGGCGGA 
TGCTTCACGG 
AACCCCTTGT 
CAGCCGCGCC 
ATTTTTGGAT 
CTCGACAATA 
AGCCGTGTGC 
TTCCGCTCAT 
CGAGGTGTGC 
AACCTTCCTG 
TGGACATTGG 
GACGCGCAAC 
TGCCGCCCAT 



ACCTTGATGC 
TGCTGGAAAA 
GGAATGGGCT 
CTTCAGACGG 
CGCTTGCAGC 
ATGGCGCAAC 
TCACGCTCGA 
GTCCATCTCA 
AGGCAGGCGC 
TTGCCGGCGG 
CGCGAAAGCA 
CCGCCCAGTA 
ACAATGAAAT 
CCTGAAAATC 
CGGCCTATTG 
TGGTTACGCT 
CCGCTGTCCG 



This corresponds to the amino acid sequence <SEQ ID 418; ORF105ng-l>: 

1 MPTVRFTESV SKQDLDALFE RAKASYGAES CWKTLYLNRL PLGNLSPEWA 

51 ERIKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLARTWN KAGLLHGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLVESN GRWHFWIGRR 

151 SPHKAVDPGK LDNIAGGGVS GGEMPSEAVC RESSEEAGLD KTLFPLIRPV 

201 SRLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSKNMMH DAQLVTLDAF YRYGLIDAAH PLSEWLDGIR L* 

ORG105ng-l and ORF105-1 show 93.5% identity in 291 aa overlap: 



orf 105-1. pep 
orfl05ng-l 



MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
MPTVRFTESVSKQDLDALFERAKASYGAESCWKTLYLNRLPLGNLSPEWAERIKKDWEAG 



orf 105-1. pej 
orfl05ng-l 



CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 
CSESSDGIFLNADGWPDMGGRLQHLARTWNKAGLLHGWRNECFDLTDGGGNPLFTLERAA 



orf 105-1 .pel 
orfl05ng-l 



FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
FRPFGLLSRAVHLNGLVESNGRWHFWIGRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVC 



orf 105-1 .pep 
orfl05ng-l 



RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
RESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 



orf 105-1. pep 
orfl05ng-l 



FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
FEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYGLIDAAHPLSEWLDGIRLX 
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Furthermore, ORF105ng-l shows homology with a yeast enzyme: 

sp|P41888 |TNR3_SCHP0 THIAMIN PYROPHOSPHOKINASE (TPK) (THIAMIN KINASE) 
>gi|1076928|piri IS52350 thiamin pyrophosphokinase (EC 2.7.6.2) - fission yeast 
(Schizosaocharomyces pombe) >gi 1 666111 (XB4417) thiamin pyrophosphokinase 
[Schizosacoharomyces pombe] >gi I 2330852 | gnll PID | e334056 (Z98533) thiamin 
pyrophosphokinase [Schizosaocharomyces pombe] Length = 569 
Score = 105 bits (259), Expect = 4e-22 



Ident: 




3 = 64/192 (33%), Positives = 94/192 (48%), Gaps = 3/192 [1%) 






268 


NKAGLLHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLSRAVHLNGLVESNGRW— HFWI 


441 






N G+ WRNE + + P+ +ER F FG LS VH + + W+ 




Sbjct: 


96 


NTFGIADQWRNELYTVYGKSKKPVLAVERGGFWLFGFLSTGVHCTMYIPATKEHPLRIWV 


155 




442 


GRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLR 


621 






RRSP K P LDN GG++ G+ + +E SEEA LD + LI P + ++ 




Sbjct: 


156 


PRRSPTKQTWPNYLDNSVAGGIAHGDSVIGTMIKEFSEEANLDVSSMNLI-PCGTVSYIK 


214 




622 


PVSRG-VHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVT 


798 






R + E+ YVFD + + +P DGEVAGF + + +L + K+ + LV 




Sbjct: 


215 


MEKRHWIQPELQYVFDLPVDDLVIPRINDGEVAGFSLLPLNQVLHELELKSFKPNCALVL 


274 




799 


LDAFYRYGLIDAAHP 843 








LD R+G+I HP 




Sbjct: 


275 


LDFLIRHGIITPQHP 289 





Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 49 

The following DNA sequence, beheved to be complete, was identified in N. meningitidis <SEQ ID 
419>: 

1 ATGAATAGAC CCAAGCAACC CTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

201 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGgATACG rGkACAATTA 

251 CAGCGAAATT CGTGGAAGAT GGmsAAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGGTAGCG TGCAGCAGCA 

351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

401 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAgCcT TAAAGCAACT 

451 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC . CAATGA 

This corresponds to the amino acid sequence <SEQ ID 420; ORF107>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWTTFA SISALLIILF 

51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT XTITAKFVED GXKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSXQ* 

Computer analysis of this amino acid sequence gave the following resuUs: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF107 shows 97.8% identity over a 186aa overlap with an ORF (ORF107a) from strain A of A^. 
meningitidis: 
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orfl07 .pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
orflOVa MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 



orfl07 .pep 
orfl07a 



TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 
TVEGQILPASGVIRVYAPDTGTITAKFXEDGEKVKAGDKLFALSTSRFGAGDSVQQQLKT 



orflOV.pep EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 
orflOVa EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 



orf 107 .pep 
orfl07a 



The complete length ORF107a nucleotide sequence <SEQ ID 421> is: 



ATGAATAGAC 
CCAAACCAGC 
CCCTATGGAC 
TTGATATTTG 
ACCTGCATCG 
CNGCGAAATT 
TTTGCGCTTT 
GTTGAAAACG 
GTCGTCTGAA 
GTCGAACGTT 
TCAGAAAAGG 
TCCTATCCGC 
GCAGAGCTTT 
AGTCGGGCTG 
TCCCCCAAGC 



CCAAGCAACC 
CTGACGGGTA 
GACATTTGCA 
GTAACTATAC 
GGCGTAATCA 
CNTGGAAGAT 
CGACCTCACG 
GAGGCAGTTT 
GCTGATACAC 
TGGAAAACCA 
CGCATTAGAC 
CAATGATGCA 
TAGAGCAGAA 
CTTCAGGAAA 
GGCATGA 



NTTCTTCCGT 
AAGTGATTCT 
TCGATATCTG 
GCGAAAGACA 
GGGTGTATGC 
GGAGAAAAGG 
TTTCGGCGCA 
TGAAGAAAAC 
GGGAATGAAA 
GGAACTCCAT 
TTGCGGAAGA 
GTGCCAAAAC 
AGCCAAACTT 
TCCGCACGCA 



CCCGAAGTCG 
GACACGACCG 
CGTTATTGAT 
ACAGTGGAGG 
ACCGGATACG 
TTAAGGCTGG 
GGAGATAGCG 
GTTGGCAGAA 
CGCGCAGCCT 
ATTTCGCAAC 
AATGTTGCAG 
AAGAAATGAT 
GATGCCTACC 
GAATCTGACA 



CCGTTGCCCG 
TTGTCATTTT 
TATCCTGTTT 
GACAAATTTT 
GGGACAATTA 
CGACAAGCTA 
TGCAGCAGCA 
CAGGAACTGG 
TAAAGCAACT 
AGATAGACGG 
AAATATCGTT 
GAATGTCAAG 
GCCGAGAAGA 
TTGGNNAGCC 



This encodes a protein having amino acid sequence <SEQ ID 422>: 



1 MNRPKQPFFR PEVAVARQTS 



LSFSLWTTFA SISALLIILF 



LIFG NYTRKT TVEGQILPAS GVIRVYAPDT GTITAKFXED GEKVKAGDKL 

FALSTSRFGA GDSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSANDA VPKQEMMNVK 

201 AELLEQKAKL DAYRREEVGL LQEIRTQNLT LXSLPQAA* 

Homology with a predicted ORF from N. gonorrhoeae 

ORF107 shows 95.7% identity over a 188aa overlap with a predicted ORF (ORF107.ng) from A^. 
gonorrhoeae: 



irfl07 .pep 

irfl07ng 

irfl07.pep 

irfl07ng 

irf 107 .pep 

>rfl07ng 

>rfl07.pep 

)rfl07ng 



MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 
I I I I I I I I I I I I I h I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MNRPKQPFFRPEVAIARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 

TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

TMEGQILPASGVIRVYAPDTGTITAKFVEDGEKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHI SQQI DGQKRRIRLAEEMLQ 180 

EAVLKKTLAEQELGRLKLIHENETRSLKATVERLENQKLHISQQIDGQKRRIRLAEEMLR 180 



KYRFLSXQ 188 
I I I I I I I 
KYRFLSAQ 18 8 
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The complete length ORF107ng nucleotide sequence <SEQ ID 423> is predicted to encode a 
protein having amino acid sequence <SEQ ID 424>: 

1 MNRPKQPFFR PEVAIARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TMEGQILPAS GVIRVYAPDT GTITAKFVED GEKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH ENETRSLKAT 

151 VERLENQKLH ISQQIDGQKR RIRLAEEMLR KYRFLSAQ* 

Based on the presence of a putative ransmembrane domain in the gonococcal protein, it is predicted 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be usefiil 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 50 



The following DNA sequence, beheved to be complete, was identified in N. meningitidis <SEQ ID 
425>: 



1 ATGCTGAATA CTTTTTTTGC CGTATTGGGC GGCTGCCTGC TGCT.TTGCC 

51 GTGCGGCAAA TCCGTAAATA CGGCGGTACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCATAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 426; ORF108>: 



1 MLNTFFAVLG GCLLXLPCGK SVNTAVQPQN AVQSAPKPVF KVIYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Further work revealed the following DNA sequence <SEQ ID 427>: 



1 ATGCTGAAAA CATCTTTTGC CGTATTGGGC GGCTGCCTGC TGCTTGCCGC 

51 GTGCGGCAAA TCCGAAAATA CGGCGGAACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 428; ORF108-1>: 



1 MLKTSFAVLG GCLLLAA CGK SENTAEQPQN AVQSAPKPVF KVKYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF108 shows 88.4% identity over a 181aa overlap with a predicted ORF (ORF108.ng) from N. 
gonorrhoeae: 

orflOS.pep MLNTFFAVLGGCLLXLPCGKSVNTAVQPQNAVQSAPKPVFKVIYIDNTAIAGLDLGQSSE 60 

orflOBng MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

orflOS .pep GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 

orflOSng GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

orf 108 . pep LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

I I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orflOBng LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

ORF108-1 shows 92.3% identity with ORFlOSng over the same 181 aa overlap: 

orf 108-1 .pep MLKTSFAVLGGCLLLAACGKSENTAEQPQNAVQSAPKPVFKVKYIDNTAIAGLDLGQSSE 60 

orfl08ng-l MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

orflOB-1 .pep GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 

orfl08ng-l GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

orf 108-1 .pep LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

orfl08ng-l LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

The complete length ORF108ng nucleotide sequence <SEQ ID 429> is: 



ATGCTGAAAa tacctTTTGC CGTGTtgggc ggCtgcctGC TGCTTGCCGC 
CTGCGGCAAA TCCGAAAATa cggcggaACA GCCGCAAAAT gcggCACAAA 
GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ACATCGACAA TACGGCGATT 
GCCGGTTTGG CTTTGGGACA AAGTAGCGAA GGCAAAACCA acgacgGCAA 
AAAACAAATC AGTTATccgA TTAAAGGCTT GCCGGAACAA Aacgccgtcc 
gGCTGACCGG AAAGCATCCC AACGACTTGG AagccgtcgT CGGCAAATGT 
ATGGAAACCG ACGGAAAGGA CGCGCCTTCG GGCTGGGCGG AAAACGGCGT 
GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 
GCAAACTGAC TGATTACCTG ATTTCGCATT CCGCCCTGCA ACCCTATCAG 
GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 
AATCGACAGC GagggGGCGT TTTATttccg ccgccgccat tattgA 



40 This encodes a protein having amino acid sequence <SEQ ID 430>: 



1 MLKIPFA VLG GCLLLAAC GK SENTAEQPQN AAQSAPKPVF KVKYIDNTAI 

51 AGLAL GOSSE GKT NDGKKQI SYPIKGLPEQ NAVRLTGKHP NDLEAWGKC 

101 METDGKDAPS GWAENGVCHT LFAKLVGNIA EDGGKLTDYL ISHSALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attachment site (underhned) and a putative ATP/GTP-binding site motif A (P-loop, double- 
imderlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



50 Example 51 



The following DNA sequence was identified in N. meningitidis <SEQ ID 43 1>: 
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1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGgATTTATC GATgcgatTg cGggCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAgCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGcCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTgCTgGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTGTT cGGGCTGACG GTCGC.ACCG CTTTTGGGTT TTTACGACGG 

451 TGTGTTCGGA CCGGGTGTCG GCTCGTTTTT TCTGATTGCC TTTATTGTTT 

501 TGCTCGGCTG CAAgCTGTTG AACGCGATGT CTTACACCAA ATTGGCGAAC 

551 GTTGCCTGCA ATCTTGGTTC GCTATCGGTA TTCCTGCTGC ACGGTTCGAT 

601 TATTTTCCCG ATTGCGGCAA CGaTGGCGGT CGGTGCGTTT GTCGGtGCGA 

651 ATTTAgGTGC GAGATTTGCC GTaCgctTCG GTTCGAAGCT GATTAA 

This corresponds to the amino acid sequence <SEQ ID 432; ORF109>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIAA ASFVGGVAGA LSVSLVSKDI 

101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VXTAFGFLRR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRTLRFEA D* 

Further work revealed the following DNA sequence <SEQ ID 433>: 



1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGCCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTGTT CGGGCTGACG GTCGCACCGC TTTTGGGTTT TTACGACGGT 

4 51 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

7 01 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 434; ORF109-1>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFVGGVAGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF109 shows 95.9% identity over a 147aa overlap with an ORF (ORF109a) from strain A of A^. 
meningitidis: 

10 20 30 40 50 60 

orf 109 . pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

orfl09a MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 109 . pep TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 

orf 10 9a TVSFARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKDILLAWPVLLIFVALYFVFSP 
70 80 90 100 110 120 
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orflOg.pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

orfl09a KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
130 140 150 160 170 180 

The complete length ORF109a nucleotide sequence <SEQ ID 435> is: 

1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGTGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCGGCA GCATCGTTTG 

2 51 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTGTT CGGTCTGACG GTTGCACCAC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

7 01 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

7 51 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This encodes a protein having amino acid sequence <SEQ ID 436>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIA A ASFAGGVVGA LSVSLV SKDI 

101 LLAVVPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

2 01 IFPIAATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109a and ORF109-1 show 99.2% identity in 262 aa overlap: 

10 20 30 40 50 60 

MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
10 20 30 40 50 60 



orf 109a. pep 
orfl09-l 



TVSFARKGLIDWKKGLPIAAASFAGGVVGALSVSLVSKDILLAWPVLLIFVALYFVFSP 
TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 



irf 109a. pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
)rf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 



orf 10 9a. pep LANVACNLG3LSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
orfl09-l LANVACNLGSLSVFLLHGS IIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVIS I 



SMAVKLLIDERNPLYQMIVSMFX 
SMAVKLLIDERNPLYQMIVSMFX 
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Homology with a predicted ORF from N.sonorrhoeae 

ORF109 shows 98.3% identity over a 231aa overlap with a predicted ORF (ORF109.ng) from N. 
gonorrhoeae: 

orf 109 .pep MEDLYIILALGLVAMIflGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

orfl09ng MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

orf 109. pep TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 

orfl09ng TVSFARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 

orf 109. pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 18 0 

orfl09ng KLDGSKEGKARMSFFLFGLTVATAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

orf 109. pep IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRTLRFEAD 231 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl09ng IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRPLRFEAD 231 

An ORF109ng nucleotide sequence <SEQ ID 437> was predicted to encode a protein having amino 
acid sequence <SEQ ID 438>: 



101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSl 



VATAFGFL RR 



Further work revealed the following gonococcal DNA sequence <SEQ ID 439>: 



ATGGAAGATT 
CGGATTTATC 
CACTCTTGTT 
CTGCAAGCAG 
AGGTTTGATT 
CAGGCGGCGT 
TTGCTGGCGG 
GTTTTCGCCC 
TTTTTCTATT 
GTGTTCGGAC 
GCTCGGCTGC 
TTGCTTGCAA 
ATTTTCCCGA 
TTTAGGTGCG 
TGCTGATTGT 
AGAAATCCGC 



TATACATAAT 
GATGCGATTG 
GGCAGGTATT 
CCGCTGCTAC 
GATTGGAAGA 
GGTCGGTGCA 
TCGTGCCGGT 
AAGCTCGACG 
CGGGCTGACG 
CGGGTGTCGG 
AAGCTGTTGA 
TCTTGGTTCG 
TTGTGGCAAC 
AGATTTGCCG 
CATCAGCATT 
TGTATCAGAT 



ACTCGCTTTG 
CGGGCGGGGG 
CCTCCCGTGT 
GTTTTCGGCT 
AAGGTCTCCC 
TTATCGGTCA 
TTTGTTGATA 
GCAGTAAGGA 
GTTGCACCGC 
CTCGTTTTTT 
ACGCGATGTC 
CTATCGGTAT 
GATGGCGGTC 
TCCGCTTCGG 
TCGATGGCTG 
GATTGTTTCG 



GGTTTGGTTG 
TGGTTTGATT 
CGGCAATTGC 
ACGGTTTCTT 
GATTGCCGCA 
GCTTGGTTTC 
TTTGTCGCGC 
AGGCAAAGCC 
TTTTGGGTTT 
CTGATTGCCT 
TTACACCAAA 
TCCTGCTGCA 
GGTGCGTTTG 
TTCGAAGCTG 
TGAAATTGTT 
ATGTTTTAA 



CGATGATCGC 
ACGCTGCCTG 
CACCAACAAG 
TTGCACGCAA 
GCATCGTTTG 
CAAAGATATT 
TGTATTTTGT 
AGAATGTCTT 
TTACGACGGT 
TTATTGTTTT 
TTGGCGAACG 
CGGTTCGATT 
TCGGTGCGAA 
ATTAAGCCGC 
GATAGACGAG 



This corresponds to the amino acid sequence <SEQ ID 440; ORF109ng-l>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIA A ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

2 01 IFPIVATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109ng-l and ORF109-1 show 98.9% identity in 262 aa overlap: 

10 20 30 40 50 60 

orfl09ng-l.pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 109-1 MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 



)rf 109ng-l . pep TVSFARKGLIDWKKGLPIAAASFAGGVVGALSVSLVSKDILLAWPVLLIFVALYFVFSP 
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orf 109-1 TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 
70 80 90 100 110 120 

130 140 150 160 170 180 

orfl09ng-l.pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
130 140 150 160 170 180 

190 200 210 220 230 240 

orfl09ng-l.pep LANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 109-1 LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 

190 200 210 220 230 240 



250 260 
orf 109ng-l .pep SMAVKLLIDERNPLYQMIVSMFX 
I I I I I I I I I I I I I I I I I I I I I I I 
orf 10 9-1 SMAVKLLIDERNPLYQMIVSMFX 

250 260 

In addition, ORF109ng-l shows homology to a hypothetical Pseudomonas protein: 

sp|P29942|YCB9_PSEDE HYPOTHETICAL 27.4 KD PROTEIN IN COBO 3'REGION (0RF9) 
>gi I 94984 Ipir I 1 138164 hypothetical protein 9 - Pseudomonas sp >gi|551929 
(M62866) 0RF9 [Pseudomonas denitrif icans] Length = 261 
Score = 175 bits (439), Expect = 3e-43 

Identities = 83/214 (38%), Positives = 131/214 (60%), Gaps = 1/214 (0%) 

Query: 41 PPVSAIATNKLQXXXXXXXXXXXXXRKGLIDWKKGLPIXXXXXXXXXXXXXXXXXXXKDI 100 

PP+ + TNKLQ R+G ++ K+ LP+ D+ 

Sbjct; 43 PPLQTLGTNKLQGLFGSGSATLSYARRGHVNLKEQLPMALMSAAGAVLGALLATIVPGDV 102 

Query: 101 LLAWPVLLIFVALYFVFSPKLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFF 160 

L A++P LLI +ALYF P + G + +R++ F+F LT+ PL+GFYDGVFGPG GSFF 
Sbjct: 103 LKAILPFLLIAIALYFGLKPNM-GDVDQHSRVTPFVFTLTLVPLIGFYDGVFGPGTGSFF 161 

Query: 161 LIAFIVLLGCKLLNAMSYTKLANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGA 220 

++ F+ L G +L A ++TK N N+G+ VFL G++++ + M +G F+GA +G+ 
Sbjct: 162 MLGFVTLAGFGVLKATAHTKFLNFGSNVGAFGVFLFFGAVLWKVGLLMGLGQFLGAQVGS 221 

Query: 221 RFAVRFGSKLIKPLLIVISISMAVKLLIDERNPL 254 

R+A+ G+K+IKPLL+++SI++A++LL D +PL 
Sbjct: 222 RYAMAKGAKIIKPLLVIVSIALAIRLLADPTHPL 255 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and then- epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 52 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 441>: 

1 . . CTGCTAGGGT ATTGCATCGG TTATCGGTAC GgCTGTTGCA GCAAAACCAG 

51 CCGCAGACGG ATTATTTGGT CAAATTCGGA TCGTTTTGGG CGAG.ATTTT 

101 TGGTTTTCTG GGACTGTATG ACGTCTATGC TTCGGCATGG TTTGTCGTTA 

151 TCATGATGTT TTTGGTGGTT TCTACCAGTT TGTGCCTGAT TCGCAATGTG 

201 CCGCCGTTCT GGCGCGAAAT GAAGTCTTTT CGGGAAAAGG TTAAAGAAAA 

251 ATCTCTGGCG GCGATGCGCC ATTCTTCGCT GTTGGATGTA AAAATTGCGC 

301 CCGAGGTTGC CAAACGTTAT CTGGAAGTAC AAGGTTTTCA GGGGAAAACC 

351 ATTAACCGTG AAGACGGGTC GGTTCTGATT GCCGCCAAAA AAGGCACAAT 

401 GAACAAATGG GGCTATATCT TTGCCCATGT TGCTTTGATT GTCATTTGCC 

451 TGGGCGGGTT GATAGACAGT AACCTGCTGT TGAAACTGGG TATGCTGACC 

501 GGTCGGATTG TTCCGGACAA TCAGGCGGTT TATGCCAAGG ATTTC.AAGC 
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This corresponds to the amino acid sequence <SEQ ID 442; ORFl 10>: 

1 ..LLGIASVIGT LLQQNQPQTD YLVKFGSFWA XIFGFLGLYD VYASAWFWI 

51 MMFLWSTSL CLIRNVPPFW REMKSFREKV KEKSLAAMRH SSLLDVKIAP 

101 EVAKRYLEVQ GFQGKTINRE DGSVLIAAKK GTMNKWGYIF AHVALIVICL 

151 GGLIDSNLLL KLGMLTGRIF RTIRRFMPRI XKPESXFGCV QSLI*GQRQY 

201 FXRGRVRMWF S* 

Computer analysis of this amino acid sequence gave the following resuUs: 



10 Homology with ORF88a from N.meninsitidis (strain A) 

ORFl 10 shows 91.5% identity over a 188aa overlap with ORF88a from strain A oiN. meningitidis: 

10 20 30 40 50 60 

orfSSa.pep MSKSRRSPPLLSRPWFAFFSSMRFA VALLSLLGIASVIGTVL QQNQPQTDYLVKFGSFWA 

15 orfllO LLGIASVIGTLL QQNQPQTDYLVKFGSFWA 

10 20 30 



70 80 90 100 110 120 

orfSSa.pep QIFGFLGLYDVYASAW FWIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 
20 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 110 XIFGFLGLYDVYASAW FVVIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 

40 50 60 70 80 90 



25 orfSSa.pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 

orfllO SS LLDVKI APEVAKRYLE VQGFQGKT INRE DG S VL lAAKKGTMNKWG YIFAHVALIVICL 

100 110 120 130 140 150 

30 190 200 210 220 230 240 

orfSSa.pep GGLI DSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 

orfllO GGLI DSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 
160 170 180 190 200 210 

35 

250 260 270 280 290 300 

orfSSa.pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 

orfllO SX 

40 However, ORF88 and ORF 1 1 0 do not ahgn, because they represent two different fragments of the 



Homology with a predicted ORF from N.sonorrhoeae 

ORFl 10 shows 88.6% identity over a 21 laa overlap with a predicted ORF (ORFllO.ng) from jV. 
gonorrhoeae: 



orfllO . pep 


LLGIASVIGTLLQQNQPQTDYLVKFGSFWA 

1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 M II: 


30 


orfllOng 


MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 


60 


orf 110. pep 


XIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREICVKEKSLAAMRH 
RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 


90 


orfllOng 


120 


orf 110. pep 


SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 


150 


orfllOng 


SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIXAHVALIVICL 


180 
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orfllO.pep GGLIDSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 210 



The complete length ORFllOng nucleotide sequence <SEQ ID 443> is predicted to encode a 
protein having amino acid sequence <SEQ ID 444>: 



1 MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLVVSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWGYIX A HVALIVICL GRLINXH LLL KLGMLAGSIF 

201 RNNRRVMPRI SKPESIWGGV QSLIKGQRQY FQRGKVRMWF S* 



Based on the putative transmembrane domains in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 53 

The following DNA sequence was identified in N. meningitidis <SEQ ID 445>: 



1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCGTCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGCGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCACCT GCCGAAATAC AAAAACGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

4 01 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGCGAGTT 

651 GCACGGCAAA GGCAAAAACG CGCGCGGCGA ACCGTGGCGC ATCGGTATCG 

701 AGCAGCCCAA TATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGCTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAACGGC AAACGCCTCT CCCATATCAT CAACCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGGTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTGTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 



This corresponds to the amino acid sequence <SEQ ID 446; ORFl 1 1>: 



1 MPSETRLPNF IRVLIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AEIQKRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWADSAM 

301 TADGLSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

351 R* 



orfllOng 




orfllO .pep 



S 211 



orfllOng 



S 241 



Computer analysis of this amino acid sequence gave the following results: 



-268- 



Homology with a predicted ORF from N.meninsitidis Tstrain A) 

ORFl 1 1 shows 96.9% identity over a 351aa overlap with an ORF (ORFl 11a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

MPSETRLPNFIRTLIFALSFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDXLPSP 

MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
10 20 30 40 50 60 

70 80 90 100 110 120 

AEIQXRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVHLNRLTH 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
70 80 90 100 110 120 



orfllla.pep 
orflll 

orfllla.pep 
orflll 



GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 



AYLDLSSIAKGFGVDXVAGELEKYGIQNYLVEIGGELHGKXKNARGEPWRIGIEQPNIVQ 
AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 



GGNTQIIVPLNNRSXATSGDYRIFHVDK3GKRL3HIINPNNKRPISHNLASISVXADSAM 
GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWADSAM 



310 320 330 340 350 

TADGXSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
310 320 330 340 350 

The complete length ORFl 11a nucleotide sequence <SEQ ID 447> is: 

1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCACCT TGATATTTGC 

51 CCTGAGTTTT ATCTTCCTGA ACGCCTGTTC GGAACAAA.CC GCGCAAACCG 

101 TTACCCTGCA AGGTGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACNAACT CCCNTCACCT GCCGAAATAC AAAANCGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC ACCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

401 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAAG CAGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATNANGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGNGAGTT 

651 GCACGGCAAA GNCAAAAACG CGCGCGGCGA ACCTTGGCGC ATCGGCATCG 

701 AACAGCCCAA CATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGNTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAGCGGC AAACGCCTCT CCCATATCAT TAATCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGNTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTNTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 448>: 



orfllla.pep 
orflll 



1 MPSETRLPNF IRTLIFALSF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
51 SNNRDXLPSP AEIQXRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 
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101 ISSDFAHVTA EAVHLNRLTH 

151 IKQAASYTGI DKIILKQGKD 

201 LEKYGIQNYL VEIGGELHGK 

251 NNRSXATSGD YRIFHVDKSG 

301 TADGXSTGLF VLGETEALKL 

351 R* 



GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 
YA3LSKTHPK AYLDLSSIAK GFGVDXVAGE 
XKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 
KRLSHIINPN NKRPISHNLA SISVXAD3AM 
AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 



Homology with a predicted QRF from N.sonorrhoeae 

ORFl 1 1 shows 96.6% identity over a 351aa overlap with a predicted ORF (ORFl ll.ng) from N. 
gonorrhoeae: 

10 20 30 40 50 60 

orflllng MPSETRLPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 

I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I 
orflll MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orflll AKIQKRIDDALKEVNRQMSTYQTDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 

orflll AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
70 80 90 100 110 120 

130 140 150 160 170 180 

orflllng GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASL3KTHPK 

orflll GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
130 140 150 160 170 180 



AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQ 
AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 



250 260 270 280 290 300 

orflllng GGNTQIIVPLNNRSLAT3GDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAM 

orflll GGNTQIIVPLNNRSLAT3GDYRIFHVDKNGKRLSHIINPNNKRPI3HNLASISWADSAM 
250 260 270 280 290 300 

310 320 330 340 350 

orflllng TADGLSTGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKLLRX 

orflll TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMS3EFEKLLRX 
310 320 330 340 350 

The complete length ORFl 1 Ing nucleotide sequence <SEQ ID 449> is: 



ATGCCGTCTG 
CCTGGGTTTC 
TTACCCTGCA 
TCAAATAATC 
TGATGATGCG 
ATTCCGAAAT 
ATTTCAAGCG 
CCTGACTCAC 
GGGGGTTCGG 
ATCAAACAGG 
AGGCAAAGAT 
ATTTATCTTC 
CTGGAAAAAT 
GCACGGCAAA 
AGCAACCCAA 
aaCaaccgtt 
TAAAAAcggc 
ccATCAGcca 
ACGGCGGACG 
CTTAAGGCTG 



AAACACGCCT 
ATCTTCCTGA 
AGGCGAAAcg 
GGGACAAACT 
CTTAAAGAAG 
CAGCCGGTTC 
ATTTCGCACA 
GGCGCACTGG 
CCCCGACAAA 
CGGCATCTTA 
TACGCTTCCT 
GATTGCCAAA 
ACGGCATTCA 
GGCAAAAATG 
TATCATCCAA 
cgctTGCCAC 
aaacgccttt 
caacctcgcc 
GTTtatCCAC 
GCAGAACAAG 



GCCGAACCTT 
ACGCCTGTTC 
aTGGGTACGA 
CCCCTCCCCT 
TCAACCGGCA 
AACCAACACA 
CGTTACCGCC 
ACGTAACCGT 
TCCGTTACCC 
TACGGGCATA 
TGAGCAAAAC 
GGCTTCGGCG 
AAATTATCTG 
CGCACGGCGA 
GgcgGCAata 
TTCCGGCGAT 
cccacaTCAT 
tccatcagcg 
AGGATTATTT 
AAAAACTCGC 



ATCCGCGCCT 
GGaacaaacC 
CCTATACCGT 
GCCAAAATAC 
GATGTCCACC 
CAGCCGGCAA 
GAAGCCGTCC 
CGGCCCTTTG 
GTGAACCGTC 
GACAAAATCA 
CCACCCCAAA 
TTGATAAAGT 
GTCGAAAtcg 
ACCGTGGCGC 
CGCAGATTAt 
TAccgtaTTT 
CAATCCCaAC 
tggtctcAGA 
GTTTTAGGCG 
TGTTTTCCTA 



TGATATTTGC 
GCGCAaaccg 
CAAATACCTT 
AAAAGCGCAT 
TACCAGACCG 
GCCCCTCCGC 
GCCTGAACCG 
GTCAACCTTT 
GCCGGAACAA 
TTTTGCAACA 
GCCTATTTGG 
TGCGGGCGAA 
gcggcGAGTT 
ATCGGTATAG 
cgtcccgctg 
tccacgtcgA 
aacAAACgac 
CAGTGCAATG 
AAACCGAAGC 
ATTGTCCGGG 
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1001 ATAAGGACGG CTACCGCACC GCCATGTCTT CCGAATTTGC CAAGCTGCTC 
1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 450>: 

1 MPSETRLPNL IRALIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AKIQKRIDDA LKEVNRQMST YQTDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILQQGKD YASLSKTHPK AYLDLSSIAK GFGVDBCVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNAHGEPWR IGIEQPNIIQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWSDSAM 

301 TADGLSTGLF VLGETEALRL AEQEKLAVFL IVRDKDGYRT AMSSEFAKLL 

351 R* 

This protein shosw homology with a hypothetical lipoprotein precursor from H. influenzae: 

spl P44550 |YOJL_HAEIN HYPOTHETICAL LIPOPROTEIN HI0172 PRECURSOR >gi | 1074292 | pir 
hypothetical protein HI0172 - Haemophilus influenzae (strain Rd FCW20) 
>gi 1 1573128 (U32702) hypothetical [Haemophilus influenzae] Length = 346 
Score = 353 bits (896), Expect = 9e-97 

Identities = 181/344 (52%), Positives = 247/344 (71%), Gaps = 4/344 (1%) 

Query: 7 LPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSPAKIQKR 66 

+ LI +1 + L AC ++T + ++L G+TMGTTY VKYL + S K + 

Sbjct: 1 MKKLISGIIAVAMALSLAACQKET-KVISLSGKTMGTTYHVKYLDDGSITATSE-KTHEE 58 

Query: 67 IDDALKEVNRQMSTYQTDSEISRFNQHT-AGKPLRISSDFAHVTAEAVRLNRLTHGALDV 125 

1+ LK+VN +MSTY+ DSE+SRFNQ+T P+ IS+DFA V AEA+RLN++T GALDV 
Sbjct: 59 lEAILKDVNAKMSTYKKDSELSRFNQNTQWTPIEISADFAKVLAEAIRLNKVTEGALDV 118 

Query: 126 TVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKI ILQQGKDYASLSKTHPKAYLDL 185 

TVGP+VNLWGFGP+K ++P+PEQ+ + ++ GIDKI L K+ A+LSK P+ Y+DL 
Sbjct: 119 TVGPWNLWGFGPEKRPEKQPTPEQLAERQAWVGIDKITLDTNKEKATLSKALPQVYVDL 178 

Query: 186 SSIAKGFGVDfCVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQGGNTQ 245 
SSIAKGFGVD+VA +LE+ QNY+VEIGGE+ KGKN G+PW+I lE+P + 

Sbjct: 17 9 SSIAKGFGVDQVAEKLEQLNAQNYMVEIGGEIRAKGKNIEGKPWQIAIEKPTTTGERAVE 238 

Query: 24 6 IIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISVVSDSAMTADGL 305 

++ LNN +A+SGDYRI+ ++NGKR +H I+P PI H+LASI+V++ ++MTADGL 

Sbjct: 239 AVIGLNNMGMASSGDYRIY-FEENGKRFAHEIDPKTGYPIQHHLASITVLAPTSMTADGL 297 

Query: 306 STGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKL 349 

STGLFVLGE +AL +AE+ LAV+LI+R +G+ T SS F KL 
Sbjct: 298 STGLFVLGEDKALEVAEKNNLAVYLIIRTDNGFVTKSSSAFKKL 341 

Based on this analysis, it is predicted that the proteins ^omN. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 54 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 45 1>: 

1 . . CCGTGCCGCC GACAGGGCGA CGACGTGTAT GCGGCGCACG CGTCCCGTCA 

51 AAAATTGTGG CTGCGCTTCA TCGGCGGCCG GTCGCATCAA AATATACGGG 

101 GCGGCGCGGC TGCGGACGGG TGGCGCAAAG GCGTGCAAAT CGGCGGCGAG 

151 GTGTTTGTAC GGCAAAATGA AGGCAGCCl^A yTGGCAATCG GCGTGATGGG 

201 CGGCAGGGCC GGCCAGCACG CwTCAGTCAA CGGCAAAGGC GGTGCGGCAG 

251 gCAGTGATTT GTATGGTTAT GgCGGGGgTG TTTATGCTgC GTGGCATCAG 

301 TTGCGCGATA AACAAACGGG TgCGTATTTG GACGGCTGGT TGCAATACCA 

351 ACGTTTCAAA CACCGCATCA ATGATGAAAA CCGTGCGGAA CgCTACAAAA 

401 CCAAAGGTTG GACGGCTTCT GTCGAAGGCG GCTACAACGC GCTTGTGGCG 

451 GAAGGCATTG TCGGAAAAGG CAATAATGTG CGGTTTTACC TACAACCGCA 

501 GgCGCAGTTT ACCTACTTGG GCGTAAACGG CGGCTTTACC GACAGCGAGG 

551 GGACGGCGGT CGGACTGCTC GGCAGCGGTC AGTGGCAAAG CCGCGCCGGC 

601 AtTCGGGCAA AAACCCGTTT TGCTTTGCGT AACGGTGTCA ATCTTCAGCC 

651 TTTTGCCGCT TTTAATGTtt TGCACAGGTC AAAATCTTTC GGCGTGGAAA 

701 TGGACGGCGA AAAACAGACG CTGGCAGGCA GGACGGCACT CGAAGGGCGG 
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751 TTCGGTATTG AAGCCGGTTG GAAAGGCCAT ATGTCCGCA. . 

This corresponds to the amino acid sequence <SEQ ID 452; ORF35>: 

1 . . PCRRQGDDVY AAHASRQKLW LRFIGGRSHQ NIRGGAAADG WRKGVQIGGE 

51 VFVRQNEGSX LAIGVMGGRA GQHASVNGKG GAAGSDLYGY GGGVYAAWHQ 

5 101 LRDKQTGAYL DGWLQYQRFK HRINDENRAE RYKTKGWTAS VEGGYNALVA 

151 EGIVGKGNNV RFYLQPQAQF TYLGVNGGFT DSEGTAVGLL GSGQWQSRAG 

201 IRAKTRFALR NGVNLQPFAA FNVLHRSKSF GVEMDGEKQT LAGRTALEGR 

251 FGIEAGWKGH MSA. . 

Computer analysis of this amino acid sequence gave the following results: 

10 Homology with putative secreted VirG-homolgue of iV. meninsitidis (accession number A32247) 
ORF and virg-h protein show 51% aa identity in 261aa overlap: 

Orf35 5 QGDDVYAAHASRQKLWLRFIGGRSHQNIRGGAA-ADGWRKGVQIGGEVFVRQNEGSXLAI 63 

+ D++ R+ LWLR I G S+Q ++G A +G+RKGVQ+GGEVF QNE + L+I 

virq-h 396 KNSDIFDRTLPRKGLWLRVIDGHSNQWVQGKTAPVEGYRKGVQLGGEVFTWQNE3NQLSI 455 

15 

Orf35 64 GVMGGRAGQHASVNGKG— GAAGSDLYGYGGGVYAAWHQLRDKQTGAYLDGWLQYQRFKH 121 

G+MGG+A Q ++ + ++ G+G GVYA WHQL+DKQTGAY D W+QYQRF+H 

virg-h 456 GLMGGQAEQRSTFHNPDTDNLTTGNVKGFGAGVYATWHQLQDKQTGAYADSWMQYQRFRH 515 

20 Orf35 122 RINDENRAERYKTKGWTASVEGGYNALVAEGIVGKGNNVRFYLQPQAQFTYLGVNGGFTD 181 

RIN E+ ER+ +KG TAS+E GYNAL+AE KGN++R YLQPQAQ TYLGVNG F+D 
virg-h 516 RINTEDGTERFTSKGITASIEAGYNALLAEHFTKKGNSLRVYLQPQAQLTYLGVNGKFSD 575 

Orf35 182 SEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVNLQPFAAFNVLHRSKSFGVEMDGEKQTL 241 
25 SE V LLCS Q Q+R G++AK +F+L + ++PFAA N L+ +K FGVEMDGE++ + 

virg-h 57 6 SENAHVNLLGSRQLQTRVGVQAKAQFSLYKNIAIEPFAAVNALYHNKPFGVEMDGERRVI 635 



Orf35 



242 AGRTALEGRFGIEAGWKGHMS 2 62 

+TA+E + G+ K H++ 
636 NNKTAIESQLGVAVKIKSHLT 656 



Homology with a predicted ORF from N.meningitidis (strain A) 

ORF35 shows 96.9% identity over a 259aa overlap with an ORF (ORF35a) from sfrain A of A^. 
meningitidis: 

10 20 30 

orf 35 . pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 

or f 35a QRLAIPEAEAVLYAQQAYAANTLFGLRAADRGDDVYAADPSRQKLWLRFIGGRSHQNIRG 



irf35.pep 
)rf35a 



GAAADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKGGAAGSDLYGYGGGV 
GAAADGRRKGVQIGGEVFVRQNEGSRLAIGVMGGRAGQHASVNGKGGAAGSYLHGYGGGV 



YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGIV 
YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGW 



GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
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550 560 570 580 590 600 

orf35a KEAALSLKWLFX 
610 620 

The complete length ORF35a nucleotide sequence <SEQ ID 453> is: 

1 ATGTTCAGAG CTCAGCTTGG TTCAAATACT CGTTCTACCA AAATCGGCGA 

51 CGATGCCGAT TTTTCATTTT CAGACAAGCC GAAACCCGGC ACTTCCCATT 

101 ATTTTTCCAG CGGTAAAACC GATCAAAATT CATCCGAATA TGGGTATGAC 

151 GAAATCAATA TCCAAGGTAA AAACTACAAT AGCGGCATAC TCGCCGTCGA 

201 TAATATGCCC GTTGTTAAGA AATATATTAC AGATACTTAC GGGGATAATT 

251 TAAAGGATGC GGTTAAGAAG CAATTACAGG ATTTATACAA AACAAGACCC 

301 GAAGCTTGGG AAGAAAATAA AAAACGGACT GAGGAGGCGT ATATAGAACA 

351 GCTTGGACCA AAATTTAGTA TACTCAAACA GAAAAACCCC GATTTAATTA 

401 ATAAATTGGT AGAAGATTCC GTACTCACTC CTCATAGTAA TACATCACAG 

451 ACTAGTCTCA ACAACATCTT CAATAAAAAA TTACACGTCA AAATCGAAAA 

501 CAAATCCCAC GTCGCCGGAC AGGTGTTGGA ACTGACCAAG ATGACGCTGA 

551 AAGATTCCCT TTGGGAACCG CGCCGCCATT CCGACATCCA TATGCTGGAA 

601 ACTTCCGATA ATGCCCGCAT CCGCCTGAAC ACGAAAGATG AAAAACTGAC 

651 CGTCCATAAA GCGTATCAGG GCGGTGCGGA TTTCCTGTTC GGCTACGACG 

7 01 TGCGGGAGTC GGACAAACCC GCCCTGACCT TTGAAGAAAA AGTCAGCGGA 

751 CAATCCGGCG TGGTTTTGGA ACGCCGGCCG GAAAATCTGA AAACGCTCGA 

801 CGGGCGCAAA CTGATTGCGG CGGAAAAGGC AGACTCTAAT TCGTTTGCGT 

851 TTAAACAAAA TTACCGGCAG GGACTGTACG AATTATTGCT CAAGCAATGC 

901 GAAGGCGGAT TTTGCTTGGG CGTGCAGCGT TTGGCTATCC CCGAGGCGGA 

951 AGCGGTTTTA TATGCCCAAC AGGCTTATGC GGCAAATACT TTGTTCGGGC 

1001 TGCGTGCCGC CGACAGGGGC GACGACGTGT ATGCCGCCGA TCCGTCCCGT 

1051 CAAAAATTGT GGCTGCGCTT CATCGGCGGC CGGTCGCATC AAAATATACG 

1101 GGGCGGCGCG GCTGCGGACG GGCGGCGCAA AGGCGTGCAA ATCGGCGGCG 

1151 AGGTGTTTGT ACGGCAAAAT GAAGGCAGCC GGCTGGCAAT CGGCGTGATG 

1201 GGCGGCAGGG CTGGCCAGCA CGCATCAGTC AACGGCAAAG GCGGTGCGGC 

1251 AGGCAGTTAT TTGCATGGTT ATGGCGGGGG TGTTTATGCT GCGTGGCATC 

1301 AGTTGCGCGA TAAACAAACG GGTGCGTATT TGGACGGCTG GTTGCAATAC 

1351 CAACGTTTCA AACACCGCAT CAATGATGAA AACCGTGCGG AACGCTACAA 

14 01 AACCAAAGGT TGGACGGCTT CTGTCGAAGG CGGCTACAAC GCGCTTGTGG 

1451 CGGAAGGCGT TGTCGGAAAA GGCAATAATG TGCGGTTTTA CCTGCAACCG 

1501 CAGGCGCAGT TTACCTACTT GGGCGTAAAC GGCGGCTTTA CCGACAGCGA 

1551 GGGGACGGCG GTCGGACTGC TCGGCAGCGG TCAGTGGCAA AGCCGCGCCG 

1601 GCATTCGGGC AAAAACCCGT TTTGCTTTGC GTAACGGTGT CAATCTTCAG 

1651 CCTTTTGCCG CTTTTAATGT TTTGCACAGG TCAAAATCTT TCGGCGTGGA 

1701 AATGGACGGC GAAAAACAGA CGCTGGCAGG CAGGACGGCG CTCGAAGGGC 

1751 GGTTCGGCAT TGAAGCCGGT TGGAAAGGCC ATATGTCCGC ACGCATCGGA 

1801 TACGGCAAAA GGACGGACGG CGACAAAGAA GCCGCATTGT CGCTCAAATG 

1851 GCTGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 454>: 

1 MFRAQLGSNT RSTKIGDDAD FSFSDKPKPG TSHYFSSGKT DQNSSEYGYD 

51 EINIQGKNYN SGILAVDNMP WKKYITDTY GDNLKDAVKK QLQDLYKTRP 

101 EAWEENKKRT EEAYIEQLGP KFSILKQKNP DLINKLVEDS VLTPHSNTSQ 

151 TSLNNIFNKK LHVKIENKSH VAGQVLELTK MTLKDSLWEP RRHSDIHMLE 

201 TSDNARIRLN TKDEKLTVHK AYQGGADFLF GYDVRESDKP ALTFEEKVSG 

251 QSGWLERRP ENLKTLDGRK LIAAEKADSN SFAFKQNYRQ GLYELLLKQC 

301 EGGFCLGVQR LAIPEAEAVL YAQQAYAANT LFGLRAADRG DDVYAADPSR 

351 QKLWLRFIGG RSHQNIRGGA AADGRRKGVQ IGGEVFVRQN EGSRLAIGVM 

401 GGRAGQHASV NGKGGAAGSY LHGYGGGVYA AWHQLRDKQT GAYLDGWLQY 

451 QRFKHRINDE NRAERYKTKG WTASVEGGYN ALVAEGWGK GNNVRFYLQP 

501 QAQFTYLGVN GGFTDSEGTA VGLLGSGQWQ SRAGIRAKTR FALRNGVNLQ 

551 PFAAFNVLHR SKSFGVEMDG EKQTLAGRTA LEGRFGIEAG WKGHMSARIG 

601 YGKRTDGDKE AALSLKWLF* 

Homology with a predicted ORF from N.sonorrhoeae 

ORF35 shows 51.7% identity over a 261aa overlap with a predicted ORF (ORF35ngh) from N. 
gonorrhoeae: 

orf35.pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 34 

:::!:: I : I I I I I I : I : I : : I 

orfSSngh FTKVQERDDIAIYAQQAQAANTLFALRLNDKNSDIFDRTLPRKGLWLRVIDGHSNQWVQG 370 
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GAA-ADGWRKGVQIGGEVFVRQNEGSXLRIGVMGGRAGQHASVNGKG — GAAGSDLYGYG 
KTAPVEGYRKGVQLGGEVFTWQNESNQLSIGLMGGQAEQRSTFRNPDTDNLTTGNVKGFG 


91 


orf 35ngh 


430 




AGVYATWHQLQDKQTGAYVDSWMQYQRFRHRINTEYATERFTSKGITASIEftGYNALLAE 




orfSSngh 


490 




GIVGKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRN 


211 


orf35ngh 


HFTKKGNSLRVYLQPQAQLTYLGVNGKFSDSENAQVNLLGSRQLQSRVGVQAKAQFAFTN 


550 


orf 35. pep 


GVNLQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSA 


263 


orf35ngh 


GVTFQPFVAVNSIYQQKPFGVEIDGDRRVINNKTVIETQLGVAAKIKSHLTLQASFNRQT 


610 



A partial ORF35ngh nucleotide sequence <SEQ ID 455> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 456>: 

1 ..KKLRDRNSEY WKEETYHIKS NGRTYPNIPA LFPKHPFDPF ENINNSKKIS 

51 FYDKEYTEDY LVGFARGFGV EKRNGEEEKP LRQYFKDCVN TENSNNDNCK 

101 ISSFGNYGPI LIKSDIFALA SQIKNSHINS EILSVGNYIE WLRPTLNKLT 

151 GWQEHLYAGL DPFHYIEVTD NSHVIGQTID LGALELTNSL WKPRWNSNID 

201 YLITKNAEIR FNTKNESLLV KEDYAGGARF RFAYDLKDKV PEIPVLTFEK 

251 NITGTSDIIF EGKALDNLKH LDGHQIVECVN DTADKDAFRL SSKYRKGIYT 

301 LSLQQRPEGF FTKVQERDDI AIYAQQAQAA NTLFALRLND KNSDIFDRTL 

351 PRKGLWLRVI DGHSNQWVQG KTAPVEGYRK GVQLGGEVFT WQNESNQLSI 

401 GLMGGQAEQR STFRNPDTDN LTTGNVKGFG AGVYATWHQL QDKQTGAYVD 

451 SWMQYQRFRH RINTEYATER FTSKGITASI EAGYNALLAE HFTKKGNSLR 

501 VYLQPQAQLT YLGVNGKFSD SENAQVNLLG SRQLQSRVGV QAKAQFAFTN 

551 GVTFQPFVAV NSIYQQKPFG VEIDGDRRVI NNKTVIETQL GVAAKIKSHL 

601 TLQASFNRQT SKHHHAKQGA LNLQWTF* 

Based on this prediction, these proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 55 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 457>: 

1 . . GCGGAATATG TTCAGTTCTC TATAGATTTG TTCAGTGTGG GTAAATCGGG 

51 GGGCGGTATA CCTAAGGCTA AGCCTGTGTT TGATGCGAAA CCGAGATGGG 

101 AGGTTGATAG GAAGCTTAAT AAATTGACAA CTCGTGAGCA GGTGGAGAAA 

151 AATGTTCAGG AAACGAGAAG AAGGAGTCAG AGTAGTCAGT TTAAAGCCCA 

201 TGCGCAACGA GAATGGGAAA ATAAAACAGG GTTAGATTTT AATCATTTTA 

251 TAGGTGGTGA TATCAATAAA AAAGGCACAG TAACAGGAGG GCATAGTCTA 

301 ACCCGTGGTG ATGTACGGGT GATACAACAA ACCTCGGCAC CTGATAAACA 

351 TGGGGT.TTA TCAAGCGACA GTGGAAATTN A 

This corresponds to the amino acid sequence <SEQ ID 458; ORF46>: 

1 . .AEYVQFSIDL FSVGKSGGGI PKAKPVFDAK PRWEVDRKLN KLTTREQVEK 

51 NVQETRRRSQ SSQFKAHAQR EWENKTGLDF NHFIGGDINK KGTVTGGHSL 

101 TRGDVRVIQQ TSAPDKHGXL SSDSGNX 

Further work revealed further partial nucleotide sequence <SEQ ID 459>: 

1 . .GCAGTGTGCC TnCCGATGCA TGCACACGCC TCAnATTTGG CAAACGATTC 

51 TTTTATCCGG CAGGTTCTCG ACCGTCAGCA TTTCGAACCC GACGGGAAAT 

101 ACCACCTATT CGGCAGCAGG GGGGAACTTG CCGAGCGCCA GTCTCATATC 

151 GGATTGGGAA AAATACAAAG CCATCAGTTG GGCAACCTGA TGATTCAACA 

201 GGCGGCCATT AAAGGAAATA TCGGCTACAT TGTCCGCTTT TCCGATCACG 

251 GGCACGAAGT CCATTCCCCs TTCGACAACC ATGCCTCACA TTCCGATTCT 

301 GATGAAGCCG GTAGTCCCGT TGACGGATTT AGCCTTTACC GCATCCATTG 

351 GGACGGATAC GAACACCATC CCGCCGACGG CTATGACGGG CCACAGGGCG 

4 01 GCGGCTATCC CGCTCCCAAA GGCGCGAGGG ATATATACAG TTACGACATA 
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4 51 AAAGGCGTTG CCCAAAATAT CCGCCTCAAC CTGACCGACA ACCGCAGCAC 

501 CGGACAACGG CTTGCCGACC GTTTCCACAA TGCCGGTAGT ATGCTGACGC 

551 AAGGAGTAGG CGACGGATTC AAACGCGCCA CCCGATACAG CCCCGAGCTG 

601 GACAGATCGG GCAATGCCGC CGAAGCCTTC AACGGCACTG CAGATATCGT 

651 TAAAAACATC ATCGGCGCTG CAGGAGAAAT TGT 

This corresponds to the amino acid sequence <SEQ ID 460; ORF46-l>: 

1 . ■ AVCLPMHAHA SXLANDSFIR QVLDRQHFEP DGKYHLFGSR GELAERQSHI 

51 GLGKIQSHQL GNLMIQQAAI KGNIGYIVRF SDHGHEVHSP FDNHASHSDS 

101 DEAGSPVDGF SLYRIHWDGY EHHPADGYDG PQGGGYPAPK GARDIYSYDI 

151 KGVAQNIRLN LTDNRSTGQR LADRFHNAGS MLTQGVGDGF KRATRYSPEL 

201 DRSGNAAEAF NGTADIVKNI IGAAGEI 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORP from N.sonorrhoeae 

ORF46 shows 98.2% identity over a lllaa overlap with a predicted ORF (ORF46ng) from N. 
gonorrhoeae: 

orf4 6.pep AEYVQFSIDLFSVGKSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 45 

I I I I I ! I ! I I I ! I 1 I I I I I I I I I I I I I I I I 
orf 4 6ng PKTGVPFDGKGFPNFEKHVKYDTKLDIQELSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 217 

orf 4 6 . pep EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDV 105 

I I I I I I I i I I I I I I ! I I I I I I I i I I I I I I I I ! I I M 1 I I I I I I I I I I : I I I I I I I I I I I I 
orf 4 6ng EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGAVTGGH3LTRGDV 277 

orf 4 6. pep RVIQQTSAPDKHGXLSSDSGN 126 

I I I I I I I I I I ! i I I I I I I I I 
orf46ng RVIQQTSAPDKHGVLSSDSGN 298 

A partial ORF46ng nucleotide sequence <SEQ ID 461> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 462>: 

1 ..RRLKHCCHAR LGSAFHRKQD GAHQRFGRYG ATQRLCRSSH PRLGSPKPQC 

51 RTRHRSRQQY LYGSHPHQRD WSCPGKIQLG RHHGTSCRAV ADXRDRICER 

101 EIRRQRQXCR CRLGKIPSLS IPKYPLKLEQ RYGKENITSS TVPPSNGKNV 

151 KLADQRHPKT GVPFDGKGFP NFEKHVKYDT KLDIQELSGG GIPKAKPVFD 

201 AKPRWEVDRK LNKLTTREQV EKNVQETRRR SQSSQFJCAHA QREWENKTGL 

251 DFNHFIGGDI NKKGAVTGGH SLTRGDVRVI QQTSAPDKHG VLSSDSGN* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 463>: 



1 TTGGGCATTT CCCGCAAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

51 CCTGCCGATG CATGCACACG CCTCAGATTT GGcaAACGAT CCCTTTATCC 

101 GgCaggttcT CGaccGTCAG CATTTCGaac ccgacggGAa ATACCaCCTA 

151 TTcggCaGCA GGGGGGAGCT TgccnagcGC aacggccATa tcggattggG 

201 aaacaTAcaa Agccatcagt tGggccacct gatgattcaa caggcggccg 

251 ttgaaggaaA TAtcgGctac attgtccgct tttccgatca cgggcacaaa 

301 ttccattcgc ccttcGAcaa ccaTGCCTCA CATTCCGATT CTGACGAAGC 

351 CGGTAGTCCC GTTGACGGAT TCAGCCTTTA CCGCATCCAT TGGGACGGAT 

401 ACGAACACCA TCCCGCCGAC GGCTATGACG GGCCACAGGG CGGCGGCTAT 

451 CCCGCTCCCA AAGGCGCGAG GGATATATAC AGCTACGACA TAAAAGGCGT 

501 TGCCCAAAAT ATCCGCCTCA ACCTGACCGA CAACCGCAGC ACCGGACAAC 

551 GGCTTGCCGA CCGTTTCCAC AATGCCGGCG CTATGCTGAC GCAAGGAGTA 

601 GGCGACGGAT TCAAACGCGC CACCCGATAC AGCCCCGAGC TGGACAGATC 

651 GGGCAATGCc gccGAAGCCT TCAACGGCAC TGCAGATATC GTCAAAAACA 

701 TCATCGGCGC GGCAGGAGAA ATTGTCGGCG CAGGCGATGC CGTGCagGGT 

751 ATAAGCGAAG GCTCAAACAT TGCTGTCATG CACGGCTTGG GTCTGCTTTC 

801 CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 

851 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA TGGCAGCCAT 

951 CCCCATCAAA GGGATTGGAG CTGTCCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGCGAT CGCATTGCCG 

1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 
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1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGC 

1201 AAAAATGTCA AACTGGCAGA CCAACGCCAC CCGAAGACAG GCGTACCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAGAA GCACGTGAAA TATGATACGA 

1301 AGCTCGATAT TCAAGAATTA TCGGGGGGCG GTATACCTAA GGCTAAGCCT 

1351 GTGTTTGATG CGAAACCGAG ATGGGAGGTT GATAGGAAGC TTAATAAATT 

1401 GACAACTCGT GAGCAGGTGG AGAAAAATGT TCAGGAAACG AGAAGAAGGA 

1451 GTCAGAGTAG TCAGTTTAAA GCCCATGCGC AACGAGAATG GGAAAATAAA 

1501 ACAGGGTTAG ATTTTAATCA TTTTATAGGT GGTGATATCA ATAAGAAAGG 

1551 CACAGTAACA GGAGGGCATA GTCTAACCCG TGGTGATGTA CGGGTGATAC 

1601 AACAAACCTC GGCACCTGAT AAACATGGGG TTTATCMGC GACAGTGGAA 

1651 ATTAAAAAGC CTGATGGAAG TTGGGAGGTG AAAACGAAAA AAGGTGGGAA 

1701 AGTGATGACC AAGCACACCA TGTTCCCAAA AGATTGGGAT GAGGCTAGAA 

1751 TTAGGGCTGA AGTTACTTCG GCTTGGGAAA GTAGAATAAT GCTTAAGGAT 

1801 AATAAATGGC AGGGTACAAG TAAATCGGGT ATTAAAATAG AAGGATTTAC 

1851 CGAACCTAAT AGAACAGCAT ATCCCATTTA TGAATAG 

This corresponds to the amino acid sequence <SEQ ID 464; ORF46ng-l>: 

1 LGISRKISLI LSILAVCLPM HAHA SDLAND PFIRQVLDRQ HFEPDGKYHL 

51 FGSRGELAXR NGHIGLGNIQ SHQLGHLMIQ QAAVEGNIGY IVRFSDHGHK 

101 FHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLADRFH NAGAMLTQGV 

201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

301 NAAQGIEAVS NIFMAAIPIK GIGAVRGKYG LGGITAHPVK RSQMGAIALP 

351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

401 KNVKLADQRH PKTGVPFDGK GFPNFEKHVK YDTKLDIQEL SGGGIPKAKP 

451 VFDAKPRWEV DRKLNKLTTR EQVEKNVQET RRRSQSSQFK AHAQREWENK 

501 TGLDFNHFIG GDINKKGTVT GGHSLTRGDV RVIQQTSAPD KHGVYQATVE 

551 IKKPDGSWEV KTKKGGKVMT KHTMFPKDWD EARIRAEVTS AWESRIMLKD 

601 NKWQGTSKSG IKIEGFTEPN RTAYPIYE* 

ORF46ng-l and ORF46-1 show 94.7% identity in 227 aa overlap: 

10 20 30 40 

AVCLPMHAHASXLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 
10 20 30 40 50 60 



orf46ng-l 



)rf4 5-l.pep QSHIGLGKIQSHQLGNLMIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
)rf4 6ng-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 



VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 



TGQRLADRFHNAGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVECNIIGAAGE 



orf46-l.pep I 
I 

orf 4 6ng-l IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
250 260 270 280 290 300 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF46ng-l shows 87.4% identity over a 486aa overlap with an ORF (ORF46a) from sfrain A of 
N. meningitidis: 
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LGISRKISLILSILAVCLPMHAHASDLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 
LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 



SGHIGLGNIQSHQLGNLFIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 



VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 



TGQRLVDRFHNTGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 



IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 



NAAQGIEAVSNIFTAVIPVKGIGAVRGKYGLGGITAHPVKRSQMGEIALPKGKSAVSDNF 
NAAQGIEAVSNIFMAAIPIKGIGAVRGKYGLGGITAHPVKRSQMGAIALPKGKSAVSDNF 



orf46a.pep 
orf46ng-l 



ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLANKRHPKTKVPFDGK 
ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLADQRHPKTGVPFDGK 



GFPNFEKDVKYDTRINTAVPQVN PIDEPVFN— PKGSVGSAHSWSITARIQYAKLP 

GFPNFEKHVKYDTKLD--IQELSGGGIPKAKPVFDAKPRWEVDRKLN-KLTTREQVEKNV 



orf 46a.pep 
orf4 6ng-l 



RQGRIRYIPPKNYSPSAPLPKGPNNGYLDKFGNEWTKGPSRTKGQEFEWDVQLSKTGREQ 
QETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDVRVIQQTS 



The complete length ORF46a DNA sequence <SEQ ID 465> is: 



TTGGGCATTT 
CCTGCCGATG 
GGCAGGTTCT 
TTCGGCAGCA 
AAACATACAA 
TTAAAGGAAA 
GTCCATTCCC 
CGGTAGTCCC 
ACGAACACCA 
CCCGCTCCCA 
TGCCCAAAAT 
GGCTTGTCGA 
GGCGACGGAT 
GGGCAATGCC 
TCATCGGCGC 
ATAAGCGAAG 
CACCGAAAAC 



CCCGCAAAAT 
CATGCACACG 
CGACCGTCAG 
GGGGGGAACT 
AGCCATCAGT 
TATCGGCTAC 
CCTTCGACAA 
GTTGACGGAT 
TCCCGCCGAC 
AAGGCGCGAG 
ATCCGCCTCA 
CCGTTTCCAC 
TCAAACGCGC 
GCCGAAGCTT 
GGCAGGAGAA 
GCTCAAACAT 
AAGATGGCGC 



ATCCCTTATT 
CCTCAGATTT 
CATTTCGAAC 
TGCCGAGCGC 
TGGGCAACCT 
ATTGTCCGCT 
CCATGCCTCA 
TCAGCCTTTA 
GGCTATGACG 
GGATATATAC 
ACCTGACCGA 
AATACCGGTA 
CACCCGATAC 
TCAACGGCAC 
ATTGTCGGCG 
TGCTGTTATG 
GCATCAACGA 



CTGTCCATAC 
GGCAAACGAT 
CCGACGGGAA 
AGCGGTCATA 
GTTCATCCAG 
TTTCCGATCA 
CATTCCGATT 
CCGCATCCAT 
GGCCACAGGG 
AGCTACGACA 
CAACCGCAGC 
GTATGCTGAC 
AGCCCCGAGC 
TGCAGATATC 
CAGGCGATGC 
CACGGCTTGG 
TTTGGCAGAT 



TGGCAGTGTG 
TCTTTTATCC 
ATACCACCTA 
TCGGATTGGG 
CAGGCGGCCA 
CGGGCACGAA 
CTGATGAAGC 
TGGGACGGAT 
CGGCGGCTAT 
TAAAAGGCGT 
ACCGGACAAC 
GCAAGGAGTA 
TGGACAGATC 
GTCAAAAACA 
CGTGCAGGGT 
GTCTGCTTTC 
ATGGCGCAAC 
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851 TCAAAGACTft TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA CGGCAGTCAT 

951 CCCCGTCAAA GGGATTGGAG CTGTTCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGAGAT CGCATTGCCG 

1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCM AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGA 

1201 AAGAATGTGA AACTGGCAAA CAAACGCCAC CCGAAGACCA AAGTGCCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAAAA AGACGTAAAA TACGATACGA 

1301 GAATTAATAC CGCTGTACCA CAAGTGAATC CTATAGATGA ACCCGTCTTT 

1351 AATCCTAAAG GTTCTGTCGG ATCGGCTCAT TCTTGGTCTA TAACTGCCAG 

1401 AATTCAATAC GCAAAATTAC CAAGGCAAGG TAGAATCAGA TATATCCCAC 

14 51 CTAAAAATTA CTCTCCTTCA GCACCGCTAC CAAAAGGACC TAATAATGGA 

1501 TATTTGGATA AATTTGGTAA TGAATGGACT AAAGGTCCAT CAAGAACTAA 

1551 AGGTCAAGAA TTTGAATGGG ATGTTCAATT GTCTAAAACA GGAAGAGAGC 

1601 AACTTGGATG GGCTAGTAGG GATGGTAAGC ATTTAAATAT ATCAATTGAT 

1651 GGAAAGATTA CACACAAATG A 

This corresponds to the amino acid sequence <SEQ ID 466>: 

1 LGISRKISLI LSILAVCLPM HAHA SDLAND SFIRQVLDRQ HFEPDGKYHL 

51 FGSRGELAER SGHIGLGNIQ SHQLGNLFIQ QAAIKGNIGY IVRFSDHGHE 

101 VHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLVDRFH NTGSMLTQGV 

201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

301 NAAQGIEAVS NIFTAVIPVK GIGAVRGKYG LGGITAHPVK RSQMGEIALP 

351 KGKSAVSDNF ADAAYMYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

401 KNVKLANKRH PKTKVPFDGK GFPNFEKDVK YDTRINTAVP QVNPIDEPVF 

451 NPKGSVGSAH SWSITTU^IQY AKLPRQGRIR YIPPKNYSPS APLPKGPNNG 

501 YLDKFGNEWT KGPSRTKGQE FEWDVQLSKT GREQLGWASR DGKHLNISID 

551 GKITHK* 

Based on this analysis, including the presence of a RGD sequence in the gonococcal protein, typical 
of adhesins, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 56 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 467>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTG... 

This corresponds to the amino acid sequence <SEQ ID 468; ORF48>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGL. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 469>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 
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401 CCGCCGCCAA AACCGACTTC CGGCACATTG CCGTCTGCGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGTCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTACTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGCCTGGCT 

1401 GAACTTCAAA ATCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 470; ORF48-l>: 



1 MHIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATA RPIVN 

51 LDYLPAALLI ALPWRFVKIA G VLAFWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFI LTAP APYQ IMTGLL LLYMLAMPFV L QKAAAKTDF R HIAVCAAW 

151 AAAGYFTG HL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 

451 NLNETFRYLK QGHVAWLNFK IK* 

Computer analysis of this amino acid sequence gave the following resuUs: 



Homology with a predicted ORF from N.menimitidis (strain A) 

ORF48 shows 94.1% identity over a 119aa overlap with an ORF (ORF48a) from strain A of A^. 
meningitidis: 

10 20 30 40 50 60 

or f 4 8 . pep MNIHTLLSKQWTLPPFLPKRLLL5LLILLAPNAVFWVLALLTATAR PIVNLDYLPAALLI 

I I I I I I I I I I I I I I I !! I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I 

orf48a MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPHAVFWVLALLTATA RPIVNLXYLPAALLI 



ALPWRFVKIAG VLAFWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI LTAPAPYQ IMTGL 

I I ! I I III I II I I I I 1 II I I I I I I I I I I I I I II I I I I II I I I ! I I II I I I II I I 

ALPWRXVKIXG VLAXWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI XTAPALYQ IMTGLL 



or f 4 8a LLYMLANPFVLQKAAAKTDFRHIAACAAWVAAGYFTGHLSXYDRGRMANIFGANNFYYA 
130 140 150 160 170 180 

The complete length ORF48a nucleotide sequence <SEQ ED 471> is: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTNNCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGANTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTNTCGT 

201 CAAAATTGNC GGCGTATTGG CGTNTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCNT GACCGCCCCC GCCCTTTATC AGATAATGAC 

351 CGGGCTGTTA CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

4 01 CCGCCGCCAA AACCGACTTC CGACACATTG CCGCCTGTGC CGCCGTTGTG 

4 51 GTGGCAGCCG GCTATTTTAC CGGCCATTTG AGTTANTACG ACCGGGGGCG 
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501 


GATGGCCAAT 


ATCTTCGGCG i 


551 


CGATGCTCTA 


CACCGTCAGC i 


601 


GTCGATCCCG 


TCTTCCTCCC i 


651 


TCTGAACGAG 


CCGAAATCTC . 


701 


GGGGGCTGCC 


GGCCAATCCC i 


751 


CTGGCGCAAA 


AAGANCGTTT ' 


801 


CATCGGCGCG 


ACGATCGAAG ( 


851 


GTTTGCGCGG 


GTTCGCACTG i 


901 


TGCCTCCCCA 


ACCGTTTGAA , 






AGTTCGCTTT i 


1001 


GCTTTCAAGA 


AATCAAAACC I 




GCCATTTTCG 


GCGGCGTGTG i 


1101 


ANTTTTCAAA 


AAACACGACA , 


1151 


GCCACGCCGA 


CTATCCCGAA ' 


1201 


ACCGAATATG 


GCCTGCCCGC i 


1251 


GCACACCCAA 


TTCTTCGACC i 


1301 


TGAAAGGCAC 


GGAAGTCATC ; 


1351 


AACCTCAATG 


AAACCTTCCG i 


1401 


GAACTTCAAA 


ATCAAATAA 


This encodes a 


protein having amino acid 



CAAACAACTT 
CAGAATGCCG 
CTTGGGCAAT 
AAAAAATCCT 
GAACTTCAAA 
TTCGGTTTGG 
GCGAAATGCG 
CGCCGCGCGC 
ACAAGAAGGT 
ACGACCGCTT 
GCCGAAAACC 
CGACAGCGAG 
AGGGACTGTT 
TCNGACATTT 
CGAAACCGAC 
AACTGGCGGA 
ATCGTCGGCG 
CTACCTCAAA 



CTATTACGCC 
ACTTTATTAC 
CAACAGCGTG 
CTTTATCGTC 
ACGCCACTTT 
GAAAGCGGCA 
CGAACTGTGT 
CCGACGAAAA 
TACGCCACCT 
CAGCTGGTAT 
TGATCGGTAA 
CTGTTCGGCG 
TTACTGGATG 
TCAACCACAG 
NTCTGCCGCA 
TTTGATCCAA 
ACCATCCGCC 
CAGGGGCACG 



AAAAGTCAGG 
CGCCGGCCTG 
CCGCCACGCA 
GCCGAATCTT 
TGCCAAACTG 
GTTTTCCCTT 
GCCTACGGCG 
ATTTGCCCGC 
TTGCGATGCA 
CCGAGGGCGG 
AAAAACCTGC 
AAGTGTCGGC 
ACGCTGACCA 
GCTCAAATGC 
ATTTCAGCCT 
CGCCCCGAAA 
GCCCGTCGGC 
TCGNCTGGCT 



NLVPFIXTAP ALYQ IMTGLL LLYMLAMPFV L QKAAAKTDF 
VAAGYFTG HL SXYDRGRMAN IFGANNFYYA KSQAMLYTVS 
VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP 
LAQKXRFSVW ESGSFPFIGA TIEGEMRELC AYGGLRGFAL 
CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT 
AIFGGVCDSE LFGEVSAXFK KHDKGLFYWM TLTSHADYPE 
TEYGLPAETD XCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI 
NLNETFRYLK QGHVXWLNFK IK* 



R HIAACAAVV 
QNADFITAGL 
ELQNATFAKL 
RRAPDEKFAR 
AENLIGKKTC 
SDIFNHRLKC 
IVGDHPPPVG 



ORF48a and ORF48-1 show 96.8% identity in 472 aa overlap: 



MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARPIVNLXYLPAALLI 
MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 



orf 4 8a . pep ALPWRXVKIXGVLAXWLAVLFDGLMMVIQLFPFMDLIGAINLVPFIXTAPALYQIMTGLL 
orf48-l ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 



orf 48a .pep LLYMLAMPFVLQKAAAKTDFRHIAACAAVWAAGYFTGHLSXYDRGRMANIFGANNFYYA 
o r f 4 8 - 1 LL YMLAMP FVLQKAAAKT D FRH lAVCAAVVAAAGYFTGHLS YYDRGRMAN I FGANN FY YA 



)rf48a.pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 
)rf48-l KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 



ELQNATFAKLLAQKXRFSVWESGSFPFIGATIEGEMRELCAYGGLRGFALRRAPDEKFAR 
ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 



CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
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orf 48a . pep LFGEVSAXFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDXCRNFSLHTQ 
orf48-l LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 



orf48a.pep FFDQLADLIQRPEMKGTEVI I VGDHPPPVGNLNETFRYLKQGHVXWLNFKIKX 
orf 4 8-1 FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF48 shows 97.5% identity over a 119aa overlap with a predicted ORF (ORF48ng) from N. 
gonorrhoeae: 

orf 48 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

orf48ng MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

orf 48 .pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 119 

orf48ng ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 120 

The ORF48ng nucleotide sequence <SEQ ID 473> was predicted to encode a protein having amino 
acid sequence <SEQ ID 474>: 



1 MNIHALLSEO WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATA RPIVN 
51 LDYLPAALLI ALPWRFVKIA G VLAFWPAVL FDGLMMVI Q L FPFMDLIGAI 
101 NLVPFI LTAP APYQ IMTGLL LLYMLAMPFV L QKAAVKTDF RHIAVCAAW 
151 AAARYFTGPF ELLRTGGRWQ YVQHRRLLLS GSRASFRRRQ KADVLRRLGN 
201 PYASMGNGG.. 

Further work identified the complete gonococcal DNA sequence <SEQ ID 475>; 



1 ATGAATATTC ACGCCCTGCT 

51 GCCGAAACGG CTGCTGCTGT 

101 TGTTTTGGGT TTTGGCACTG 

151 TTGGACTACC TTCCCGCCGC 

201 CAAAATTGCC GGCGTATTGG 

251 TGATGATGGT GATCCAACTC 

301 AACCTCGTCC CCTTCATCCT 

351 CGGGCTGTTG CTGCTGTATA 

401 CCGCCGTCAA AACCGACTTC 

451 GCGGCAGCCG GCTATTTCAC 

501 GATGGCCAAT ATCTTCGGCG 

551 CGATGCTCTA CACCGTCAGC 

601 GTCGACCCCG TCTTCCTCCC 

651 GCTGAGTGAG CCGAAATCTC 

701 GGGGGCTGCC GGGCAATCCC 

751 CTGGCGCAAA AAGACCGTTT 

801 CATCGGCGCG ACGGTCGAAG 

851 GTTTGCGCGG GTTCGCACTG 

901 TGCCTCCCCA ACCGTTTGAA 

951 CGGCGCGGGT AGTTCGCTTT 

1001 GCTTTCAAAA AATCAAAACC 

1051 GCCATTTTCG GCGGCGTGTG 

1101 ATTTTTCAAA AAACACGACA 

1151 GCCACGCCGA CTATCCCGAA 

1201 ACCGAATACG GCCTGCCCGC 

1251 GCACACCCAA TtcttcgACC 

1301 TGAAAGGCAC GGAAGTCATC 

1351 AACCTCAATG AAACCTTCCG 

1401 GCACTTCAAA ATCAAATAA 



CTCCGAACAA TGGACGCTGC CGCCATTCCT 
CCCTGCTGAT ACTGCTGGCC CCCAATGCGG 
CTGACCGCCA CCGCCCGCCC GATTGTCAAT 
GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 
CGTTTTGGCC GGCGGTTTTG TTTGACGGGC 
TTCCCTTTTA TGGACCTCAT CGGCGCCATC 
GACCGCCCCC GCCCCTTATC AGATAATGAC 
TGCTGGCGAT GCCGTTTGTG TTGCAAAAAG 
CGACACATTG CCGTCTGTGC CGCCGTTGTG 
CGGCCATTTG AGTTACTACG ACCGGGGGCG 
CAAACAACTT CTATTACGCc aAAAGTCAGG 
CAGAATGCCG ACTTTATTAC CGCCGgcctG 
CTTGGGCAAT CAGCAGCGTG CCGCCACGCG 
AAAAAATCCT CTTTATCGTC GCCGAATCTT 
GAGCTTCAAA ACGCCACTTT TGCCAAACTG 
TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 
GCGAAATGCG CGAATTGTGC GCCTACGGCG 
CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 
ACAAGAAGGT TACGCCACCT TTGCGATGCA 
ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 
GCCGAAAACC TGATCGGTAA AAAAACCTGC 
CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 
AGGGACTGTT TTACTGGATG ACGCTGACCA 
TCCGACATTT TCAACCACAG GCTCAAATGC 
CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 
AACTGGCGGA TTTGATCCGA CGCCCCGAAA 
ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 
CTACCTCAAA CAGGGACACG TCGCCTGGCT 



This encodes a protein having amino acid sequence <SEQ ID 476; ORF48ng-l>: 
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1 MNIHALLSEQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWPAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGLL LLYMLAMPFV LQKAAVKTDF RHIAVCAAW 

151 AAAGYFTGHL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

5 201 VDPVFLPLGN QQRAATRLSE PKSQKILFIV AESWGLPGNP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG S3LYDRFSWY PRAGFQKIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIR RPEMKGTEVI IVGDHPPPVG 

10 451 NLNETFRYLK QGHVAWLHFK IK* 

ORG48ng-l and ORF48-1 show 97.9% identity in 472 aa overlap: 

10 20 30 40 50 60 

orf4 8-l.pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

15 orf4 8ng-l MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 



orf4 8-l.pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

orf4 8ng-l ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 4 8-1 . pep LLYMLMPFVLQKAAAKTDFRHIAVCAAWAAAGYFTGHLSYYDRGRMANIFGANNFYYA 

orf48ng-l LLYMLAMPFVLQKAAVKTDFRHIAVCAAWAAAGYFTGHLSYYDRGRMANIFGANNFYYA 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 48-1 . pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 
I I I I! I I I I I I I I I I I I I I I I I I ! I I I I ! I I I I I I : I : I I I I I I I I I I I I I I I I I h I I 
orf48ng-l KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATRLSEPKSQKILFIVAESWGLPGNP 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 48-1. pep ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf48ng-l ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 48-1. pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLXGKKTCAIFGGVCDSE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I 
orf4 8ng-l CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQKIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 48-1. pep LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 

orf48ng-l LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
370 380 390 400 410 420 

430 440 450 460 470 

orf 4 8-1. pep FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 
I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
orf4 8ng-l FFDQLADLIRRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLHFKIKX 

430 440 450 460 470 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and two putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 57 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 477>: 

1 ..GTGAGCGGAC GTTACCGCGC TTTGGATCGC GTTTCCAAAA TCATCATCGT 

51 TACTTTGAGT ATCGCCACGC TTGCCGCCGC CGGCATCGCT ATGTCGCGCG 

101 GTATGCAGAT GCAGTCCGAT TTTATCGAGC CGACACCGTG GACGCTTGCC 

151 GGTTTGGGCT TCCTGATCGC GCTGATGGGC TGGATGCCCG CGCCGATTGA 

201 AATTTCCGCC ATCAATTCTT TGTGGGTAAC CGAAAAACAA CGCATCAATC 

251 CTTCCGAATA CCGCGACGGG ATTTTTGAAT TCAACGTCGG TTATATCGCC 

301 AGTGCGGTTT TGGCTTTGGT TTTCCTTGCA CTGGGCGC.G TAGCGCCGAA 

351 CGGCAACGGC GA.ACAGTGC AGATGGCGGG CGGCAAATAT AACGGGCAAT 

401 TGATCAATAT GTACGCC . . 

This corresponds to the amino acid sequence <SEQ ID 478; ORF53>: 

1 ..VSGRYRALDR VSKIIIVTLS lATLAAAGIA MSRGMQMQSD FIEPTPWTLA 
51 GLGFLIALMG WMPAPIEISA INSLWVTEKQ RINPSEYRDG IFEFNVGYIA 
101 SAVLALVFLA LGXVAPNGNG XTVQMAGGKY NGQLINMYA. . 

Further work revealed the complete nucleotide sequence <SEQ E) 479>: 



1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 TCCGGGGATC ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

4 01 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

4 51 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TCGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTTAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This corresponds to the amino acid sequence <SEQ ID 480; ORF53-l>: 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIA M SRGMQMQSDF lEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFHVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITW DGYAR AIAEPVRLLR GKDKTGNAE F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM lAAFVSAPVF A WLNYRLVKG DEKHKLTSGM N ALALAGLIY 

401 LTGFTVLFL L NLAGMFK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis ("strain A) 

ORF53 shows 93.5% identity over a 139aa overlap with an ORF (ORF53a) from strain A of TV. 
meningitidis: 
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VSGRYRALDRVSK IIIVTLSIATLAAAGIA 
AAIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTL5IATLAAAGIA 



MSRGMQMQSDFIEPTPW TLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 
MSRGMQMQSDFIEPTPW TLA6LGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 



I FE FNVGY IASAVLALVFLALGXV APNGNGXTVQMAGGKYNGQLINMYA 
I I : I I I I I I I I I I I I I I I I I I I : III : I I I II I II I II I I I I I 
IFDFNVGY IASAVLALVFLALGAFV QYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLV 



orf53a AFIAFACMYGTTITW DGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFD 
290 300 310 320 330 340 

The complete length ORF53a nucleotide sequence <SEQ ID 481> is: 



1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 ACCGGGGATT ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

4 01 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

451 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TTGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTCAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This encodes a protein having amino acid sequence <SEQ ID 482>: 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIAM SRGMQMQSDF lEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQiylAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITW DGYAR AIAEPVRLLR GKDKTGHAE F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM lAAFVSAPVF A WLNYRLVKG DEKHKLTSGM NA LALAGLIY 

401 LTGFTVLFLL NLAGMFK* 

ORF 53a shows 100.0% identity in 417 aa overlap with ORF53-1: 

10 20 30 40 50 60 

orf53a.pep MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALII ILTNLFKYPF 

orf53-l MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALIIILTNLFKYPF 
10 20 30 40 50 60 



80 



90 100 110 120 
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orf 53a . pep FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 

orf53-l FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 53a. pep MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 

orf 53-1 MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf 53a. pep lEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

orf 53-1 lEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 
190 200 210 220 230 240 



)r f 53a . pep AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
)rf53-l AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 



TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 



irf 53a . pep lAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 
.rf53-l lAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF53 shows 92.1% identity over a 139aa overlap with a predicted ORF (ORF53ng) from N. 
gonorrhoeae: 

orf 53. pen VSGRYRALDRVSKIIIVTL3IATLAAAGIA 30 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf53ng AAIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTL3IATLAAAGIA 91 

orf53.pep MSRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 90 

orf53ng MSRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAIN3LWVTEKQRINPSEYRDG 151 

orf 53. pep IFEFNVGYIASAVLALVFLALGXVAPNGNGXTVQMAGGKYNGQLINMYA 139 

orf53ng IFDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMGGGKYIGQLINMYAVTIGGGSRPLV 211 



An ORF53ng nucleotide sequence <SEQ ID 483> was predicted to encode a protein having amino 
acid sequence <SEQ ID 484>: 



1 MPKKSCVYLW VFLILCIASA TINAGAVAIV TAAIVKMAIP SLMFDAGTVA 

51 ALIMASCLII LVSGRYRALD RVSK IIIVTL SIATLAAAGI AM SRGMQMQP 

101 DFIEPTPW TL AGLGFLIALM GWMPA PIEIS AINSLWVTEK QRINPSEYRD 

151 GIFDFNVGY I ASAVLALVFL ALGAFV QYGN GEAVQMGGGK YIGQLINMYA 

201 VTIGGGSRPL VAFIAFACMY GAASTW DGY ARAIAEPVRL LRGKDKTARP 

251 IVLLEKLGGR HRFGRDFLV* 

Further analysis revealed ftirther partial DNA gonococcal sequence <SEQ ID 485>: 



1 . . aagaAAAGCT GCGTTTATTT GTGGGTTTTT TTGATTTTGT GTATCGCCTC 
51 CGCCACGATT AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA 
101 AAATGGCGAT TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG 
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ATTATGGCAT 
GGATCGTGTT 
CCGCCGCCGG 
ATCGAGCCGA 
GATGGGCTGG 
GGGTAACCGA 
TTCGATTTCA 
CCTTGCACTG 
TGGCGGGCGG 
ATCGGCGGCT 
GTACGGCACG 
AACCCGTGCG 
TTtgccTGGA 
GTTTGACggc 
TTGTGTCCGC 
GACAAACGCC 
CCTGCTCTAC 
GACTTTTGGC 



CCTGCCTGAT 
TCCAAAATCA 
CATCGCTATG 
CACCGTGGAC 
ATGCCCGCGC 
AAAACAACGC 
ACGTCGGTTA 
GGCGCGTTTG 
CAAATATATC 
GGTCTCGTCC 
ACGATTACCG 
CCTGCTGCGC 
ATATTTGGGT 
gcaaTGGCgG 
CCCTGTGTTC 
ACAGGCTTAC 
CTGGCCGGGT 
ATAG 



TATTTTGGTG 
TCATTGTTAC 
TCGCGCGGTA 
GCTTGCCGGT 
CGATCGAAAT 
ATCAATCCTT 
TATCGCcagT 
TGCAATACGG 
GGGCAATTGA 
GCTGGTGGCG 
TTGTGGACGG 
GGCAGGGATA 
GGCGGGCAGC 
AACtgcTCAA 
GCCTGGCTCA 
CGCCGGTATG 
TTGCCGTTTT 



AGCGGACGTT 
TTTGAGCATC 
TGCAGATGCA 
TTGGGCTTCC 
TTCCGCCATC 
CTGAATACCG 
GCGGTTTTGG 
CAACGGCGAA 
TTAATATGTA 
TTTATCGCGT 
TTATGCGCGT 
AAACCGGCAA 
GGTTTGGCGG 
ATTTGCGATG 
ACTACCGCCT 
AACGCCCTTG 
GTTCCTGTTG 



ACCGCGCTTT 
GCCACGCTTG 
GCCCGATTTT 
TGATCGCGCT 
AATTCTTTGT 
CGACGGGATT 
CTTTGGTTTT 
GCAGTGCAGA 
TGCCGTAACC 
TTGCCTGTAT 
GCCATTGCCG 
CGCCGAGTTG 
TGATTTTCTG 
ATtgccgcCT 
CGTCAAAGGG 
CCATTGTCGG 
AACCTTACCG 



This corresponds to the amino acid sequence <SEQ ID 486; ORF53ng-] 



1 ..KKSCVYLWVF LILCIASATI NAGAVAIVTA AIVKMAIPSL MFDAGTVAAL 

51 IMASCLIILV SGRYRALDRV SK IIIVTLSI ATLAAAGIAM SRGMQMQPDF 

101 lEPTPW TLAG LGFLIALMGW MPA PIEISAI NSLWVTEKQR INPSEYRDGI 

151 FDFNVGY IAS AVLALVFLAL GAFV QYGHGE AVQMAGGKYI GQLINMYAVT 

201 IGGWSRPL VA FIAFACMYGT TITW DGYAR AIAEPVRLLR GRDKTGNAEL 

251 FAWNIWVAGS GLAVIF WFDG AMAE LLKFAM lAAFVSAPVF A WLNYRLVKG 

301 DKRHRLTAGM NA LAIVGLLY LAGFAVLFL L NLTGLLA* 

ORF53ng-l and ORF53-1 show 94.0% identity in 336 aa overlap: 



60 70 80 90 100 110 

o r f 5 3 - 1 . pep I LTNLFKYPFFRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSAT INAGAVAIVTA 

orf53ng-l KKSCVYLWVFL I LC lASAT INAGAVAIVTA 

10 20 30 



)rf 53-1. pep AIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAM 
)rf53ng-l AIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAM 



)rf 53-1. pep SRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
)rf53ng-l SRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 



240 250 260 270 280 290 

orf 53-1 . pep FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVA 

orf53ng-l FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVA 
160 170 180 190 200 210 

300 310 320 330 340 350 

orf 53-1. pep FIAFACMYGTTITVVDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDG 

orf53ng-l FIAFACMYGTTITWDGYARAIAEPVRLLRGRDKTGNAELFAWNIWVAGSGLAVIFWFDG 
220 230 240 250 260 270 

360 370 380 390 400 410 

orf 5 3-1 .pep VMANLLKFAMIAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLL 

orf 53ng-l AMAELLKFAMIAAFVSAPVFAWLNYRLVKGDKRHRLTAGMNALAIVGLLYLAGFAVLFLL 
280 290 300 310 320 330 



orf 53-1. pep NLAGMFKX 

I I : I : : 
orf53ng-l NLTGLLAX 
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Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 58 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 487>: 

1 ..TTGCGGGAAA CGGCATATGT TTTGGATAGT TTTGATCGTT ATTTTGTTGT 

51 TGCGCTTGCC GGCTTGTTTT TTGTCCGCGC ACAATCCGAA CGCGAGTGGA 

101 TGCGCGAGGT TTCTGCGTGG CAGGAAAAGA AAGGGGAAAA ACAGGCGGAG 

151 CTGCCTGAAA TCAAAGACGG TATGCCCGAT TTTCCCGAAC TTGCCCTGAT 

201 GCTTTTCCAC GCCGTCAAAA CGGCAGTGTA TTGGCTGTTT GTCGGTGTCG 

251 TCCGTTTCTG CCGAAACTAT CTGGCGCACG AATCCGAACC GGACAGGCCC 

301 GTTCCGCCT.. 

This corresponds to the amino acid sequence <SEQ ID 488; ORF58>: 



Further work revealed the complete nucleotide sequence <SEQ ID 489>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

4 01 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAAGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

7 01 AACGCACGTA TTCCCATATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCACCGTC 

851 ATGCAGGGCA GGGGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACGGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGAATTTC TCGCCTGATT CCGGAAAGTC AGACGGTTGT GGGGAAAGGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGACCGC AATCGATATT CAGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GTCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAG ACCGACCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGGATGACG GCAGTGAAGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCCATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACGGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT AATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATCT GGAAAAAGAT 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 
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1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGA AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGTAATCTTG CGGGCTTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

2401 GTGGTCGTGG TCGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCATTTGAT TCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2 601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2 651 GTCAGGGCGA TATGCTGTTC CTGCTGCCGG GTACTGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TTTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGACGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2 901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG TATCGGCTAC AACCGCGCCG 

2 951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 490; ORF58-l>: 



1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGWR FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED lATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSHM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FHRHAGQGKG QAEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESQTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDETA DIHIEEPAAP DAWWEPPEV 

401 PKVPMTAIDI QPPPPVSEIY NRTYEPPSGF EQVQRSRIAE TDHLADDVLN 

451 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 

501 PSCRVSDTEA DEGAFPSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKVV DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPVVTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGIPHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMSFMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEK LPFI 

801 WWDEFADL MMT AGKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LLPGTAYPQR 

901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDDET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 

Computer analysis of this amino acid sequence predicts the indicated transmembrane region, and 
also gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORFS 8 shows 96.6% identity over a 89aa overlap with an ORF (ORFS 8a) from strain A of A^. 
meningitidis: 

10 20 30 40 50 60 

orf 58 .pep LRETAYVLDSFDRYFVV ALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 

o r f 5 8 a MFWIVLIVILLLALAGLFFVRAOS EREWMREVSAWQEKKGEKQAELPE IKDGMPD 



)rf 58 .pep FPELALM LFHAVKTAVYWLFVGW RFCRHYLAHESEPDRPVPP 

)rf5aa FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVPPASANRADVPTASDGYSD 
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The complete length ORF58a nucleotide sequence <SEQ ID 491> is: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAATCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

551 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGGNAAAGGG CAGGCGGAGG CNAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCNGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAANTGTTTC 

1101 GTCTGTGGGA TACGGCGNTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGCCCGC AATNGATATT CCGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GGCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAA ACCGATCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGAATGACG GCAGTGAGGG 

1401 TGTGGCAGAG CGGTCAAGCG GGCAATATTT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCCGCC GGGCATNGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCGCCGCT GTTCAATCCC GGGGCGACGC AAACGGAAGA AG7\NCTGTTG 

1651 GANAACAGCA TCACCATCGA AGAAAAATNG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTAAATCT GGAAAAAGAN 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCT 

1851 CGGCAAAACC TGTATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGGCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCA7\AGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGTNTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG GGAGAAAATC GGCAACCCGT 

2351 TCAGCCTCAC GCCCGACAAT CCCGAACCTT TGGANAAATT GCCGTTTATC 

2401 GTGGTCGTGG TTGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCATCTTAT CCTTGCCACA CAACGCCCCA GTGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC GGCAGGATTC TTGACCAAAT GGGTGCGGAA AACCTGCTCG 

2651 GGCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACGGCCTA TCCGCAGCGC 

27 01 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

27 51 TCTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATN TTGAGCGGCG 

2801 GTATGTCCGA CGATTTGCTG GGAATCAGCC GGAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTGTCNGTT GTTTTGAAAA CGCGCAAAGC 

2 901 CAGCATTTCT GGCGTGCAGC GCGCATTGCG TATCGGCTAT AATCGCGCCG 

2 951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTNGACAATG CTTGA 

This encodes a protein having amino acid sequence <SEQ ID 492>: 

1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGVV R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED lATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 
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SADYGFEPYF 
QGQSVSDGTA 
DVEMPSETEN 
PKVPMPAXDI 
GGWQEETAAI 
PSRRAXDTEA 
XNSITIEEKX 
LARSLGVASI 
KLTLALGQDI 
APEDVRMIMI 
RYRLMSFMGV 
WVVDSFADL 
LIKANIPTRI 
VHGAFASDEE 
DPMYDEAVSV 
HNGNRTILVP 



EKQHPSAFSA 
VRDAXRRVSV 
VFTEXVSSVG 
PPPPPVSEIY 
ANDGSEGVAE 
DEGAFQSEET 
AEFKVKVKW 
RWETILGKT 
TGQPWTDLG 
DPKMLELSIY 
RNLAGXNQKI 
MMTAGKKIEE 
AFQVSSKIDS 
VHRVVEYLKQ 
VLKTRKASIS 
XDNA* 



VKAENARNAP 
NLKEPNKATV 
YGXPVYDETA 
NRTYEPPAGF 
RSSGQYLSET 
GAVSEHLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGIPHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDX 
GVQRALRIGY 



FRRHAGQGKG 
SAEARISRLI 
DIHIEEPAAP 
EQVQRSRIAE 
EAFGHDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VTDMKLAANA 
GNPFSLTPDN 
AAGIHLILAT 
NLLGQGDMLF 
LSGGMSDDLL 
NRAARLIDQM 



QAEAKSPDVS 
PESRTWGKR 
WDAWWEPPEV 
TDHLADDVLN 
CPFENVPSER 
GATQTEEXLL 
GNSVLNLEKX 
NSPEFAESKS 
AMILSMLFKA 
LNWCVNEMEK 
PEPLXKLPFI 
QRPSVDVITG 
LPPGTAYPQR 
GISRSGDGET 
EAEGIVSAPE 



ORF58a and ORF58-1 show 96.6% identity in 1014 aa overlap: 



orf56-l 



MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 



irf58a.pep 
.rf58-l 



LMLFHAVKTAVYWLFVGVVRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 



orf58a.pep 
orf58-l 



EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 
EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 



EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 
EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 



orf58a.pep 
orf58-l 



FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQGKGQAEAKSPDVS 
FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 



orf5Ba.pep 
orf58-l 



QGQSVSDGTAVRDAXRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 
QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 



>rf58a.pep 
>rf58-l 



VFTEXVSSVGYGXPVYDETADIHIEEPAAPDAWWEPPEVPKVPMPAXDIPPPPPVSEIY 
VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 



orf58a.pep 
orf58-l 



NRTYEPPAGFEQVQRSRIAETDHLADDVLNGGWQEETAAIANDGSEGVAERSSGQYLSET 
NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWOEETAAIADDGSEGAAERSSGQYLSET 



orf58a.pep 
orf58-l 



EAFGHDSQAVCPFENVPSERPSRRAXDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 
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orf58a.pep GATQTEEXLLXNSITIEEKXAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKX 
or f 58-1 EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 



orfSBa.pep LARSLGVASIRVVETILGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
or f 58-1 LARSLGVASIRVVETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 



TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 



>rf 58a . pep EGIPHLLAPVVTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGXNQKIAEAAARGEKI 
>rf58-l EGIPHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 



>r f 58a . pep GNPFSLTPDNPEPLXKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 



irfSBa.pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 
)rf58-l QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 



)rf 58a . pep VHGAFASDEEVHRWEYLKQFGEPDYVDDXLSGGMSDDLLGISRSGDGETDPMYDEAVSV 
)rf58-l VHGAFASDEEVHRVVEYLKQFGEPDYVDDILSGGGSEELPGIGR3GDDETDPMYDEAVSV 



50 Homology with a predicted QRF from N. gonorrhoeae 

ORF58 shows complete identity over a 9aa overlap with a predicted ORF (ORF58ng) from N. 
gonorrhoeae: 

orfSS.pep ALMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPP 103 
55 orfSSng SEPDRPVPPASANRADVPTASDGYSDSGNG 30 

The ORF58ng nucleotide sequence <SEQ ID 493> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 494>: 

1 . . S£PDi^PVPPA SANRADVPTA SDGYSDSGNG TEEAETEAAE AAEEEAADTE 

51 DIATAVIDNR RIPFDRSIAE GLMQSESKTS PVRPVFKEIT LEEATRALSS 

60 101 AALRETKKRY IDAFEKNGTA VPKVRVSDTP MEGLQIIGLD DPVLQRTYSR 

151 MFDADKEAFS ESADYGFEPY FEKQHPSAFS AVKAENARNA PFRRHAGQEK 

201 GQAEAKSPDV SQGQSVSDGT AVRDARRRVS VNLKEPNKAT VSAEARISRL 

251 IPESRTWGK RDVEMPSETE NVFTETVSSV GYGGPVYDEA ADIHIEEPAA 

301 PDAWWEPPE VPEVAVPEID ILPPPPVSEI YNRTYEPPAG FEQAQRSRIA 
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351 ETDHLAADVL NGGWQEETAA lADDGSEGAA ERSSGQYLSE TEAFGHDSQA 

401 VCPFEDVPSE RPSCRVSDTE ADEGAFQSEE TGAVSEHLPT TDLLLPPLFN 

451 PEATQTEEEL LENSITIEEK LAEFKVKVKV VDSYSGPVIT RYEIEPDVGV 

501 RGNSVLNLEK DLARSLGVAS IRWETIPGK TCMGLELPNP KRQMIRLSEI 

551 FNSPEFAESK SKLTLALGQD ITGQPWTDL GKAPHLLV AG TTGSGKS VGV 

601 NAMILSMLFK AAPEDVRMIM IDPKMLELSI YEGITHLLAP WTDMKLAAN 

651 ALNWCVNEME KRYRLMSFMG VRNLAGFNQK lAEAAARGEK IGNPFSLTPD 

701 DPEPLEK LPF IWWDEFAD LMMT AGKKIE ELIARLAQKA RAAGIHLILA 

751 TQRPSVDVIT GLIKANIPTR lAFQVSSKID SRTILDQMGA ENLLGQGDML 

801 FLPPGTAYPQ RVHGAFASDE EVHRVVEYLK QFGEPDYVDD ILSGGGSEEL 

851 PGIGRSGDGE TDPMYDEAVS WLKTRKASI SGVQRALRIG YNRAARLIDQ 

901 MEAEGIVSAP EHNGNRTILV PLDNA* 

This partial gonococcal sequence contains a predicted transmembrane region and a predicted 
ATP/GTP -binding site motif A (P-loop; double underlined). Furthermore, it has a domain 
homologous to the FTSK cell division protein of E. coli. Alignment of ORF58ng and FtsK 
(accession number p46889) show a 65 % amino acid identity in 459 overlap: 

ORF58ng: 467 lEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 526 

+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
FtsK: 868 VEARLADFRIKADWNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

0RF58ng: 527 IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 586 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
FtsK: 928 IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 987 

ORF58ng: 587 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 646 

LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
FtsK: 988 LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 1047 

ORF58ng: 647 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP— 704 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

FtsK: 1048 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADR^MRPIPDPYWKPGDSMDAQH 1107 

ORF58ng: 705 — LEKLPFIVVVVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 762 

L+K P+IVV+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
FtsK: 1108 PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

ORF58ng: 763 IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 822 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
FtsK: 1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

ORF58ng: 823 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 882 

H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
FtsK: 1228 HAVVQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

ORF58ng: 883 VQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVP 921 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
FtsK: 1287 VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 

Further work on ORFSSng revealed the complete gonococcal DNA sequence to be <SEQ ID 495>: 



1 ATGTTTTGGA TAGTTTTGAT CGTTATtgtg TTGCTTGCGC TTGCCGGCCT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAGTTTTCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGTiA 

2 51 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGGTATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGC AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCgA TACgGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCcatTCG ACCGGAGTAT TGCTGAAGGG TTGATGCAGT CTGAAAGCAA 

501 AACTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAGCAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGG AACAGCCGTC CCCAAAGTAC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTTGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 
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801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAflTGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGAGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAGCTGCC GATATCCATA 

1151 TTGAAGAGCC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGGAGGTAG CCGTACCCGA AATCGATATT CTGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAGCCGCC GGCAGGATTC GAGCAGGCGC 

1301 AACGCAGCCG CATTGCCGAA ACCGACCATC TTGCCGCTGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCAGATGACG GCAGTGAGGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAGATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

1551 GGAAGAGACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATTT GGAAAAAGAC 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATTTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATT ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGCTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCA CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGCTTCAA 

2301 CCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

2401 GTGGTCGTGG TCGATGAGTT TGCCGATTTG ATGATGACGG CAGGCAAGAA 

24 51 AATCGAAGAA CTGATTGCGC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCACCTTAT CCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC GGCAGGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2651 GTCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACTGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TCTGAAGCAG TTTGGCGAGC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG CATCGGCTAC AACCGCGCCG 

2951 CGCGTCTGAT TGACCAAATG GAAGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 496; ORF58ng-l>: 

1 MFWIVLIVIV LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPEFS LM LFHAVKTA VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEAAEA AEEEAADTED lATAVIDNRR 

151 IPFDRSIAEG LMQSESKTSP VRPVFKEITL EEATRALSSA ALRETKKRYI 

201 DAFEKNGTAV PKVRV3DTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAF3A VKAENARNAP FRRHAGQEKG QAEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESRTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDEAA DIHIEEPAAP DAWWEPPEV 

401 PEVAVPEIDI LPPPPVSEIY NRTYEPPAGF EQAQRSRIAE TDHLAADVLN 

451 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFEDVPSER 

501 PSCRVSDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKVV DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGITHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMSFMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEKLPFI 

801 WWDEFADL MMT AGKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LPPGTAYPQR 

901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDGET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 

ORF58ng-l and ORF58-1 show 97.2% identity in 1014 aa overlap: 
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10 20 30 40 50 60 

orf 58-1. pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

orf58ng-l MFWIVLIVIVLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPEFS 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 58-1 . pep LMLFHAVKTAVYWLFVGVVRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

orf58ng-l LMLFHAVKTAVYWLFVGVVRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
70 80 90 100 110 120 



130 140 150 160 170 180 

or f 5 8 - 1 . pep EEAETEEAEAAEEEAADTEDIATAVIDNRRI PFDRS lAEGLMPSESEI S PVRPVFKEITL 

orf58ng-l EEAETEAAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMQSESKTSPVRPVFKEITL 
130 140 150 160 170 180 



>rf 58-1. pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 
I I I I I I I : I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I 
)rf58ng-l EEATRALSSAALRETKKRYIDAFEKNGTAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 



irf 58-1. pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 
irf58ng-l FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQEKGQAEAKSPDVS 



■rf 58-1 . pep QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 
irfSSng-l QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 



370 380 390 400 410 420 

orf 58-1 . pep VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWVVEPPEVPKVPMTAIDIQPPPPVSEIY 

orf58ng-l VFTETVSSVGYGGPVYDEAADIHIEEPAAPDAWWEPPEVPEVAVPEIDILPPPPVSEIY 
370 380 390 400 410 420 



orf 58-1. pep NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 
orf58ng-l NRTYEPPAGFEQAQRSRIAETDHLAADVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 



orf58-l.pep 
orf58ng-l 



EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 
EAFGHDSQAVCPFEDVPSERPSCRVSDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 



orf 58-1 . pep EATQTEEELLENSITIEEKLAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 
orf58ng-l EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 



orf 58-1 . pep LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
orf58ng-l LARSLGVASIRVVETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 



irf 58-1 . pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
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EGIPHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 
EGITHLLAPVVTDMKLflANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 



GNPFSLTPDDPEPLEKLPFIVWVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I M I I I I I 
GNPFSLTPDDPEPLEKLPFIVWVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 



irf 58-1. pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 
irfSBng-l QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 



irf 58-1 .pep VHGAFASDEEVHRVVEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 



970 980 990 1000 1010 

VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPLDNAX 

VLKTRKAS I SGVQRALRIGYNRAARLI DQMEAEGI VSAPEHNGNRT I LVPLDNAX 
970 980 990 1000 1010 



30 Furthennore, ORF58ng-l shows significant homology to the E.coli protein FtsK: 

spl P4 688 9 I FTSK_ECOLI CELL DIVISION PROTEIN FTSK >gi 1 1651412 | gnl | PID | dl015290 (Dl 
division protein FtsK [Escherichia coli] >gi | 1651418 | gnl| PID | dl015296 (D90727) Cell 
division protein FtsK [Escherichia coli] >gi 11787117 (AE000191) cell division 
protein FtsK [Escherichia coli] Length = 1329 
35 Score = 576 bits (1469), Expect = e-163 

Identities = 301/459 (65%), Positives = 353/459 (76%), Gaps = 5/459 (1%) 



Sbjct: 



Sbjct: 



556 lEEKLAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 615 

+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
868 VEARLADFRIKADWNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

615 IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPVVTDLGKAPHL 67 5 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PVV DL K PHL 
92 8 IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTVVLGKDIAGEPVVADLAKMPHL 987 

676 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 735 

LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
98 8 LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 104 7 

736 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP-- 793 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

104 8 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 1107 

Query: 794 — LEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 851 

L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
Sbjct: 1108 PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1157 

Query: 852 IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 911 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
Sbjct: 1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

Query: 912 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 971 

H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
Sbjct: 1228 HAWQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

Query; 972 VQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVP 1010 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
Sbjct; 1287 VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 
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Based on this analysis, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 59 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 497>: 

1 ATGATTTftTC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGC..GTGA TCGCCATCGA TGCCGTGTTG 

151 GCATTGGTCG GCTTCTGGGT C 

// 

901 A TTGCCATCGG TTTGTTTTTA ATTTACCAAA ACGGGCTGAC 

951 CCTGCTTTTT GAAGCCGTGG AAGACGGCAA AATCCATTTT TGGCTCGGAC 

1001 TGCTGCCTAT GCACATTATC ATGTTTGTCC TTGCACTCAT CCTGTTGCGC 

1051 GTCCGCAGTA TGCCCAGCCA GCCCTTCTGG CAGGCGGTTG GCAAAAGTCT 

1101 GACATTGAAA GGCGGAAAAT GA 

This corresponds to the amino acid sequence <SEQ ID 498; ORF101>: 



Further work revealed the complete nucleotide sequence <SEQ ID 499>: 



1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCA 

151 TTGGTCGGCT TCTGGGTCAT CGGTATGACG CCGCTTTTGC TGGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGCGACAGCG 

251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

301 CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

401 TCCTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAGGCAGG CGAGTTCAAC 

451 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 

501 CGAATCCGGC ATCATGAAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 

551 GCGGCGACAA CATCATCTTC GCCAAAGAAG GTAACTTCTC GCTGAACGAC 

601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 

651 CGGACGCGCC GACTACAATC AGGTTTCCTT CCAAAAACTC AACCTGATTA 

701 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACCATT 

751 CCGACCGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC AGGCGGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

901 TTGATTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT TATCATGTTT GCCGTTGCAC TCATCCTGTT GCGCGTCCGC 

1051 AGTATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

This corresponds to the amino acid sequence <SEQ ID 500; ORF101-1>: 



1 MIYQRHLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFK 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 PTAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF AVALILL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N.meninsitidis (strain A) 

ORFlOl shows 91.2% identity over a 57aa overlap and 95.7% identity over a 69aa overlap with 
an ORF (ORF 101 a) from strain A of 7^. meningitidis: 

10 20 30 40 50 

orf 101 . pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWVX 

orflOla MIYQRNLIKELSFTAVGIFVVLLAVLVSTQAINLLGXAADXRX-AIDAVLALVGFWVXXM 
10 20 30 40 50 

// 

90 100 110 

orf 101 .pep lAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 

orf 101a LTVSVLLLCLLAVPLSYFNPRSGHTYNILXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 



120 130 140 150 

orf 101 .pep LPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGKX 

or f 1 0 1 a LPMHI IMFVIAI VLLRVRSMPSQPFWQAVGKSLTLKGGKX 

340 350 360 370 

The complete length ORFlOla nucleotide sequence <SEQ ID 501> is: 



1001 
1051 
1101 



ATGATTTATC 
CATTTTCGTC 
TGCTCGGCCN 
TTGGTCGGCT 
CGCATTTATC 
AAATGTCGGT 
CCGGTGATGC 
GCTTTGGGTG 
TCCTGAAGCA 
AGTTTGGGCA 
CGAATCCGGC 
GCGGCGACAA 
AACAAACGCA 
CGGACGCGCC 
TCAGCACCAC 
CCNACNGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGANTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGCATGCCCA 
GAAAGGCGGA 



AAAGAAACCT 
GTCCTCTTGG 
TGCCGCCGAC 
TCTGGGTCNN 
AGTACGTTGA 
CTGGNTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
ATCATGAAAA 
CATCATCTTC 
CGCTCGAATT 
GACTACAATC 
GCCCAAACTC 
AACTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



CATCAAAGAA 
CGGTATTGGT 
NGGCGTNTCG 
NNGNATGACG 
CCGTGTTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTACG 
TTGTCTTTGG 
CAGGGTTTAT 
ACCTGTTCCT 
NCCAAAGAAA 
GCGCCACGGC 
AGGTTTCCTT 
ATCGACCCCG 
CAGCAGCAAC 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



CTCTCTTTTA 
CTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CATTGAAACA 
GTTTTGGTTG 
CAGCCGCGAA 
TGGAGGCAGG 
TTTGTCGAAA 
GCGCGAACAG 
GTAACTTCTC 
TACCGTTACA 
CCNAAAACTC 
TTTCCCACCG 
CCGCAACATC 
CCTCCTACTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



CCGCCGTCGG 
GCAATCAACC 
CGTGTTGGCA 
TNGTGTTGAC 
CGNGACAGCG 
ATGGATACGC 
CCGTCATGCA 
TACGCTGAAA 
CGGGTTCAAC 
CCTTCGATAC 
GACAAAAACG 
GCTGAACGAC 
GCGGCACGCC 
AACCTGATTA 
CCGTACNATN 
ANGCGGAATT 
TGCCTGCTTG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This encodes a protein having amino acid sequence <SEQ ID 502>: 



1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGXAAD XRXAIDAVLA 

51 LVGFWVXXMT PLLL VLTAFI STLTVLTRYW RDSEMSVWXS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGGFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF XKESNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFXKL NLIISTTPKL IDPVSHRRTX 

251 PTAQLIGSSN PQHXAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LXAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORFlOla and ORFlOl-1 show 95.4% identity in 371 aa overlap: 

orf lOla . pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGXAADXRXAIDAVLALVGFWVXXMT 60 
orf 101-1 MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 60 



orf 101a .pep PLLLVLTAFISTLTVLTRYWRDSEMSVWXSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 
orf 101-1 PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 
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orflOla.pep IPWAELRSREYAEILKQKQELSLVEAGGFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 

orf 101-1 IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 

orflOla.pep DKNGGDNIIFXKESNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFXKLNLIISTTPKL 240 

I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I 
orf 101-1 DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 240 

orflOla.pep IDPVSHRRTXPTAQLIGSSNPQHXAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 

orf 101-1 IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 

orflOla.pep LXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 360 

orf 101-1 LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 3 60 

orflOla.pep VGKSLTLKGGK 371 

I I I I I I I I I I I 
orflOl-1 VGKSLTLKGGK 371 

Homology with a predicted ORF from N.sonorrhoeae 

ORFlOl shows 96.5 % identity in 57aa overlap at the N-terminal domain and 95.1% identity in 
61aa overlap at the C-terminal domain, respectively, with a predicted ORF (ORFlOlng) from N. 
gonorrhoeae: 

orf 101. pep MIYQRNLIKELSFTAVGIFVVLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWV 57 
orflOlng MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRV-AIDAVLALVGFWVIGM 59 

// 

orf 101. pep lAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 333 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orflOlng SLTVSVLLLCLLAVPLSYFNPRSGHTYNILIAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 331 

orf 101. pep LLPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGK 373 

orflOlng LLPMHIIMFVIAIVLLRVRSMPSQPFWQAVG 362 

The ORFlOlng nucleotide sequence <SEQ ID 503> is predicted to encode a protein having partial 
amino acid sequence <SEQ ED 504>: 



1 MIYORNLIKE LSFTAVGIFV V LLAVLVSTQ AINLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 NLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLKD 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 STAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VG. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 505>: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTGTTGGT GTCCACGCAG GCGATCAACC 

101 TGCTTGGCCG CGCAGCTGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCC 

151 TTAGTCGGCT TCTGGGTCAT CGGTATGACC CCGCTTTTGC TGGTGTTGAC 

201 CGCATTCATC AGCACGCTGA CCGTATTGAC CCGCTACTGG CGCGACAGCG 

251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CGTTGAAACA GTGGATACGC 

301 CCCGTCATGC AGTTTGCCGT GCCGTTTGCC ATCCTGATTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTGCG CAGCCGCGAA TATGCCGAAA 

4 01 TTTTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAAGCCGG CGAGTTCAAT 

4 51 AACTTGGGCA AGCGCAACGG CAgggtttaT TtcgtcgaaA CCTTTGACAC 

501 CGaatCcgGC ATCATGAAAA ACCTGTtCCt GcGCGAACAG GACAAAAACG 

551 gcggcgacaA CATCATCTTC GCcaaaGAag gtaactTctc gctgaaggaC 
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1001 
1051 
1101 



AACAAAcgca 
CGGacGCGCc 
TCAGCACCAC 
tcgacCGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGATTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGTATGCCCA 
GAAAGgcgGA 



cgctcgaATT 
gactaCAATC 
GCCCAAacTT 
AAcTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



GCGCCACGGC 
AGGTTtcctt 
ATCGaccCCG 
CAGCAGCAAT 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



TACCGTTACA 
cCAAAAacTc 
TTTCCCACCG 
CCGCAACATC 
CCTCCTGCTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



GCGGcacgcC 
aacctgATta 
CCGCACCATT 
AGGCAGAATT 
TGCCTACTCG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This corresponds to the amino acid sequence <SEQ ID 506; ORF101ng-l>: 

1 MIYQRHLIKE LSFTAVGIFV VLLAVLVSTQ A IHLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 NLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLKD 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 STAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORFlOlng-1 and ORFlOl-1 show 97.6% identity in 371 aa overlap: 



)rf 101-1. pep 
>rfl01ng-l 



MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 

I I ! I I I I I I I I I I I I I I I I I I !! I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I 

MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 



orf 101-1. pep 
orf lOlng-1 



PLLLVLTAFISTLTVLTRYWRDSEM3VWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 

I I I I I ! I I I I I I I I I I I I I I I I I I I I I I i I i I I I I I I I 1 I i I I I I I I I I I : I : I I I I I I 
PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAILIAVMQLWV 



orf 101-1. pep 
orflOlng-1 



IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 
IPWAELRSREYAEILKQKQELSLVEAGEFNNLGKRNGRVYFVETFDTESGIMKNLFLREQ 



orf 101-1 .pep 
orflOlng-1 



DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 
DKMGGDNIIFAKEGNFSLKDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 



irf 101-1 .pep 
irflOlng-1 



IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
IDPVSHRRTISTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 



orf 101-1 .pep 
orflOlng-1 



LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 

I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I :: I :: I I I I I I I I M I I I i i 
LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 



orf 101-1 .pep 
orflOlng-1 



VGKSLTLKGGKX 

I I I I I I I I I I I I 
VGKSLTLKGGKX 



Based on this 
and several putative 



including the presence of a putative leader sequence (double-imderUned) 
transmembrane domains (single-underlined) in the gonococcal protein, it is 
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predicted that the proteins from N.meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 60 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 507>: 

1 ..GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

51 GCAATATCAA GCAGGAGACC TTAGCGCTTT TAAGATAAGG CAAGGCAATG 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GATGCACATT 

251 CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

301 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT.GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

401 ATTCGTAA 

This corresponds to the amino acid sequence <SEQ ID 508; ORFl 13>: 

1 . .GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNWIAGHG LDARDTDYTR 
51 ILSYHSKIDA PVWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 
101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRHS* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with with pspA putative secreted protein of N.meningitidis ("accession AFQ30941) 

ORF and pspA show 44% aa identity in 179aa overlap: 

orfllS GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILSYHSKIDA 60 

GGG INA+ TLT+ P G+L+ F + G WI G GLD D DYTRILS ++I+A 
pspa GGGLINAASVTLTSGVPVLNNGNLTGFDVSSGKVVIGGKGLDTSDADYTRILSRAAEINA 256 

orfll3 PVWGQDVRWAGQNDVAATGDAHSPILXXXXXXXXXXXXXXGTHIPLFAIDTGKLGGMYA 120 

VWG+DV+VV+G+N + G + P AIDT LGGMYA 

pspa GVWGKDVKVVSGKNKLDFDG SLAKTASAPSSSDSVTPTVAIDTATLGGMYA 307 

orfll3 NKITLISTVEQAGIRNQGQWFASAGNVAVNAEGKLVNTGMIAATGENHAVSLHARNVHN 17 9 

+KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A +++ A+ V N 

pspa DKITLISTDNGAVIRNKGRIFAATGGVTLSADGKLSNSGSIDAA EITISAQTVDN 362 

Homology with a predicted ORF from N.sonorrhoeae 

0RF113 shows 86.5% identity in 52aa overlap at the N- terminal part and 94.1% identity in 17aa 
overlap at the C-terminal part with a predicted ORF (ORFl 13ng) from N. gonorrhoeae: 

orfll3 GGGFINASCATLTTAKPQYQAGDLSAFKIR 30 

orfllSng SHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQAGDFSGFKIR 224 

orfll3 QGNWIAGHGLDARDTDYTRILSYHSKIDAPVWGQDVRWAGQNDVAATGDAHSPILNNA 90 

I I I : I I I I I I I I I 1 I I I :! I I I 
orfll3ng QGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 2 63 

or f 11 3 IDTGKLGGXVCQQNHLDQYGRASRHS 135 

111111111111:1111 

orfll3ng DFSGFKIRQGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 



The complete length 0RF113ng nucleotide sequence <SEQ ID 509> is predicted to encode a 
protein having amino acid sequence <SEQ ID 510>: 
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1 MNKTLYRVIF NRKRGAVVAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
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101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARWVN QINSSHPSQL NGYIEVGGRR AEWIANPAG lAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 61 

The following partial DNA sequence was identified in N. meningitidis <SEQ E) 51 1>: 

1 . . TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 

51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCCCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

201 ATACATTATC AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

251 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

301 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

401 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

451 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 

501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTGCCA 

601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 

651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

701 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

751 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGCCGAA CAGACATTAT 

851 TGCTCAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT 

901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

951 TATCACAGGC AAAGAAAAAG GTGTTT . . 

This corresponds to the amino acid sequence <SEQ ID 512; 0RF115>: 

1 ..STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

251 DNIGGRIHAQ KSAVTATQDI NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV.. 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N.meninsitidis (accession number AF03094n 
ORFl 15 and pspA protein show 50% aa identity in 325aa overlap: 

OrfllS: 1 STGHSEQNYTLPREITRNISLGSFAYESHRKALSHHAPSQGTELPQSNGISLPYTSNSFT 60 

STG+S Y E++ +1 +G AY+ + + P + NGI +T 
pspA: 778 STGYSRSPYEPAPEVS-SIRMGISAYKGYAPQQASDIPGTWPWAENGIHPTFT 831 

OrfllS: 61 PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 120 

LP+SSL+ I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+ 
pspA: 832 -LPNSSLFAIAPNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQK 890 

OrfllS: 121 LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 180 

L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 
pspA: 891 LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIV 950 

OrfllS: 181 WLVQKEVKLPDGGTQTVLVPQVYVRVKNGDIDGKGALLSGSNTQINVSGSLKN-SGTIAG 23 9 

WL + V LPDG TQTVL P+VYVR + D++G+GALLSGS I SG+++N G lAG 
pspA: 951 WLENETVTLPDGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAG 1009 
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OrfllS: 240 RNALIINTDTLDNIGGRIHAQKSAVTATQDINNIGGMLSAEQTLLLNAGXXXXXXXXXXX 299 

R ALI+N +N+G++ ADING+AE LLL A 

pspA: 1010 REALILNAQNIKNLQGDLQGKNIFAAAG3DITNTGS-IGAENALLLKA3NNIESRSETRS 1068 

OifllS: 300 XXXXXXXXXYLDRMAGIYITGKEKG 324 

+ R+AGIY+TG++ G 
pspA: 1069 NQNEQGSVRNIGRVAGIYLTGRQNG 1093 

Homology with a predicted ORF from N.sonorrhoeae 

0RF115 shows 91.9% identity over a 334aa overlap with a predicted ORF (ORFllSng) from 
N.gonorrhoeae: 

orfllS.pep STGHSEQNYTLPREITRNISLGSFAYESHRK 31 

orfllSng NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 71 

orfllS.pep ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 81 

I I I : I I I I 1 M I I I I I 1 I I j I I I I I I I I I I !: I I I I I ! I I : I I I I I I I 

orfllSng ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 131 

orf lis . pep DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 141 

orfllSng DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 191 

orf 115 . pep EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 201 

orfllSng EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 251 

orfllS.pep VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 2 61 

I I I I I I I I I I I I I I I I I I I I I I I! I I I I I I I I I I M I I I I I ! I I I I I I I I I I I I I I I I I 
orfllSng VYVRVKNGGIDGKGALLSGSNTQINVSGSLPCNSGTIAGRNALIINTDTLDNIGGRIHAQK 311 

orf lis . pep SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 321 

I I 1 I I I I I I I I I I I : I I I I I I I I I ! 1 I I I I I : I I I : I I I I : I I I I I I I I I I I I I I I I I I 
orfllSng SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGS3TYLDRMAGIYITGK 371 

orfllS.pep EKGV 325 

orfllSng EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 431 

An ORF 1 1 5ng nucleotide sequence <SEQ ID 5 1 3> was predicted to encode a protein having amino 
acid sequence <SEQ ID 5 1 4>: 
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MLVQTEKDGL 
LPEEITRDIS 
SLPYTPNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RLFKQAKAPK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAK 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
QFDKAKTTAL 



Further work revealed the following partial gonococcal DNA sequence <SEQ ID 515>: 

1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 
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301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

401 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

451 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

1451 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 516; ORFl 15ng-l>: 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFPCALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

4 01 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

This gonococcal protein (0RF115ng-l) shows 91.9% identity with ORFl 15 over 334aa: 

20 30 40 50 60 70 

orf 115ng-l . p NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 

orfll5 STGHSEQNYTLPREITRNISLGSFAYESHRK 

10 20 30 

80 90 100 110 120 130 

orf 115ng-l . p ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 
I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I : I I I I I I I I 

orf 115 ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

40 50 60 70 80 
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)rfll5ng-l.p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
irfllS DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 



) EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 
EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 



.15ng-l.p VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
.15 VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 



irfll5ng-l .p SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 
.rf 115 SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGI^ 



270 



380 



28( 



390 



300 



310 



320 



400 



orfll5ng-l.p EKGVLAAQAGKDINIIAGQI3NQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 
I I I I 

orfll5 EKGV 

In addition, it shows homology with a secreted N. meningitidis protein in the database: 

putative secreted protein [Neisseria meningitidis] Length 



gi|2623258 {AF030941) 
= 2273 

Score = 604 bits (1541) , Expect = 
Identities = 325/678 (47%), Positi 



1-172 



= 449/678 (65%), Gaps = 22/678 (3%) 



Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Qtery: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G lAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 

+ N+ G + + A DI N G I AE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 1078 

Query: 360 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQ3DQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 
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Query: 


540 


Sbjct: 


1259 




599 


Sbjct: 


1319 




659 


Sbjct: 


1379 


ionthi 


analy 



their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 62 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 517>: 

1 . . TCAGGGAATA ACCTCAATGC CiyU^GCTGCC GAAGTCAGCA GCGCAAACGG 

51 TACACTCGCT GTGTCTGCCA ATAATGACAT CAACATCAGC GCAGGCATCA 

101 ACACGACCCA TGTTGATGAT GCGTCCAAAC ACACAGGCAG AAGCGGTGGT 

151 GGCAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACCGC 

201 CCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

251 ATGCCAACAT CCTTGGCAGC AATGTTATTT CCGATAATGG CACCCAGATT 

301 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

351 CGAAACCTAT CATCAAACCC AGAAATCAGG ATTGATGAGT GCAGGTATCG 

4 01 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

4 51 AACGAACATA CAGGCAGTAC CGTAGGCAGC TTGAAAGGCG ATACCACCAT 

501 TGTTGCAGGC AAACACTACG AACAAATCGG CAGTACCGTT TCCAGCCCGG 

551 AAGGCAACAA TACCATCTAT GCCCAAAGCA TAGACATTCA AGCGGCACAC 

601 AACAAATTAA ACAGTAATAC CACCCAAACC TATGAACAAA AAGG . CTAAC 

651 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA . . . 

This corresponds to the amino acid sequence <SEQ ID 518; 0RF117>: 

1 ..SGNNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTQI 

101 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

151 NEHTGSTVGS LKGDTTIVAG KHYEQIGSTV SSPEGNNTIY AQSIDIQAAH 

201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ . . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of Kmeninsitidis (accession number AFQ30941) 
ORFl 17 and pspA protein show 45% aa identity in 224aa overlap: 

Orfll7: 4 NLNAKAAEVSSANGTLAVSANNDINISAGINTTHVDDASKHTGRSGGGNKLVITDKAQSH 63 

++ +AAEV S G L ++A DI + AG T +DA K+TGRSGGG K +T ++ 
pspA: 1173 DIRIRAAEVGSEQGRLKLAAGRDIKVEAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQ 1232 

Orfll7: 64 HETAQSSTFEGKQVVLQAGNDANILGSNVISDNGTQIQAGNHVRIGTTQTQSQSETYHQT 123 

+ AST +GK+++L +G D + GSN+I+DN T + A N++ + +T+S+S ++ 
pspA: 1233 NGQAVSGTLDGKEIILVSGRDITVTGSNIIADNHTILSAKNNIVLKAAETRSRSAEMNKK 1292 

Orfll7: 124 QKSGLM-SAGIGFTIGSKTNTQENQSQSNEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSS 182 

+KSGLM S GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 
pspA: 12 93 EKSGLMGSGGIGFTAGSKKDTQTNRSETVSHTESVVGSLNGNTLISAGKHYTQTGSTISS 1352 



Orfll7: 
pspA: 



183 PEGNNTIYAQSIDIQAAHNKLNSNTTQTYEQKXLTVAFSSPVTD 226 

P+G+ 1+ IIAAN++ +Q YEQK +TVA S PV + 
1353 PQGDVGISSGKISIDAAQNRYSQESKQVYEQKGVTVAISVPVVN 1396 
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Homology with a predicted ORF from N.sonorrhoeae 

0RF117 shows 90% identity over a 230aa overlap with a predicted ORF (0RF117ng) from 
N. gonorrhoeae: 



orfll7.pep SGNNLNAKAAEVSSANGTLAVSANNDINIS 30 

orfll7ng IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAPCAAEVGSAKGTLAVYAKNDITIS 480 

orf 117 .pep AGINTTHVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 90 

orfll7ng SGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQVVLQAGNDANILGS 540 

orf 117 .pep NVISDNGTQIQAGNHVRIGTTQTQSQSETyHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 150 

orfll7ng NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 600 

orf 117 .pep NEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSSPEGNNTIYAQSIDIQAAHNKLNSNTTQT 210 

orfll7ng NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 660 

orfll7.pep YEQKXLTVAFSSPVTDLAQQ 230 

or f 1 1 7 ng YEQKGLTVAFS S PVT DLAQQAIAVAHKAAKQFDKAKTTALMPWRLPMQVGRLFKQAKAPK 720 

An ORFl 17ng nucleotide sequence <SEQ E) 519> was predicted to encode a protein having amino 
acid sequence <SEQ ID 520>: 



1 ..LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 
51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 
101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 
151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 
201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 
251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 
301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 
351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 
401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 
451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 
501 GNKLVITDKA QSHHETAQSS TFEGKQVVLQ AGNDANILGS NVISDNGTRI 
551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 
601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 
651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAK QFDKAKTTAL 
701 MPWRLPMQVG RLFKQAKAPK K* 

Further work revealed the following gonococcal partial DNA sequence <SEQ ID 521>: 

1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

401 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

451 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 
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1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

1451 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 522; ORFl 17ng-l>: 



1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

0RF117ng-l shows the same 90% identity over a 230aa overlap with ORFl 17. In addition, it 
shows homology with a secreted N. meningitidis protein in the database: 

gi 12623258 (AF030941) putative secreted protein [Neisseria meningitidis 1 Length = 
2273 

Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps = 22/578 (3%) 

LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 
L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 7 96 

LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 
+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 84 0 

NPANKGYLVETDPRFANYRQWLGSDYMLG3LKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 
G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 
DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G lAGR ALI+N 
DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 

LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 





1 


Sbjct: 


739 




61 


Sbjct: 


797 




121 


Sbjct: 






181 


Sbjct: 


901 


Query: 


241 


Sbjct: 


961 


Query: 


300 
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3 60 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
107 9 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+H-T+G+++L + ++ +AAEVGS +G L + A DI + 
1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 
119 9 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 

54 0 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT G3K +TQ N+S 
1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

599 QSNEHTGSTVGSLKGDTTIVASKHYEQTG3NVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
1319 ETVSHTESWGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRY3QESK 1378 

Query: 659 QTYEQKGLTVAFSSPVTD 67 6 

Q YEQKG+TVA S PV + 
Sbjct: 1379 QVYEQKGVTVAISVPVVN 1396 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Sbjct: 
Sbjct: 
Sbjct: 



Example 63 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 523>: 



1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAwAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGyCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCAACGAAAC 

401 CTGCCGACGC GTCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAT CCTGGTTTGA 

501 CGTGCGCATC GACTTCATCT CCTAT . . . 

This corresponds to the amino acid sequence <SEQ ID 524; ORFl 19>: 



1 MXYIVLFLAV VLAWAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS ATKPADASAK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY... 

Further work revealed the complete nucleotide sequence <SEQ ID 525>: 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

4 01 CTGCCGACGC GCCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 
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651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAACGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 526; OPIFI 19-1>: 

1 MIYIVLFLAV VLAVVA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS APKPADAPAK PAPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meningitidis ("strain A) 

ORFl 19 shows 93.7% identity over a 175aa overlap with an ORF (ORF 1 19a) from strain A oiN. 
meningitidis: 



MIYIVLFLAVVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 



irf 119 . pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
>rfll9a MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 



orf 119 . pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 

or f 11 9a TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 



The complete length ORFl 19a nucleotide sequence <SEQ ID 527> is; 



ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 
CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 
GGCACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 
GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 
GGTCAAAAAA ACGGCAAAAT CCCAAGACCC CGCCATGCGC AACCTGCAAG 
AGCAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 
TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 
CTCCGCCCAC ACCGTTCCCG AACCCCAAAC CGGACATTCC GCACCAAAAC 
CTGCCGACGC GCCGGCAAAA CCTGTTCCCG TTCCGCAAAC GCCGGCAAAA 
CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 
CGTGCGCTTC GACTTCATCT CTTATATCGC GCTGACCGAA GCCAAAGAAC 
TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 
TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 
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651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA TGCATTCGCA 

751 CACAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACTATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTATAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCTCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC TUVAACCGCAT TGCGCCTGTT CTCCTAA 

This encodes a protein having amino acid sequence <SEQ ID 528>; 

1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK TAKSQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVPEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 HSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORFl 19a and ORFl 19-1 show 98.6% identity in 428 aa overlap: 

10 20 30 40 50 60 

orf 119a . pep MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

orf 119-1 MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 119a. pep MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 11 9-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 



TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 



AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 



AFNRQVDAFAHSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 



AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 



GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 



429 
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orfll9a.pep KTALRLFSX 
orfll9-l KTALRLFSX 

Homology with a predicted ORF from N.sonorrhoeae 

0RF119 shows 93.1% identity over a 175aa overlap with a predicted ORF (0RF119ng) from 
N. gonorrhoeae: 

orf 119 . pep MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 60 

orfll9ng MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 60 

orf 119 .pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 120 

orfll9ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 120 

orf 119 .pep TVSEPQTGH3ATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 175 

orfll9ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 180 

The complete length ORFl 19ng nucleotide sequence <SEQ ID 529> is: 

1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA CCGGCCAAAC CCCAAGACTC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAATCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

401 CTGCCGACGC GCCGGCAAAA CCCGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTtc gACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT tccAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATCCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGCGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCGCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGTCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTA 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCCC TGCGCCTGTT TTCATAA 

This encodes a protein having amino acid sequence <SEQ ID 530>: 

1 MIYIVLFLAA VLAWA YHMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK PAKPQDSAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE EIGIIGNSAH TVSEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQADAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

4 01 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORFl 19ng and ORFl 19-1 show 98.4% identity over 428 aa overlap: 

10 20 30 40 50 60 

orfll9ng MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

orf 11 9-1 MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 
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orfll9ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 

5 orf 119-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfll9ng TVSEPQTGH3APKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

orfll9-l TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
130 140 150 160 170 180 

190 200 210 220 230 240 

15 orfll9ng AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLA3QEELS 

orf 11 9-1 AKELEIALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
190 200 210 220 230 240 

20 250 260 270 280 290 300 

o r f 1 1 9ng AFNRQADAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQT lAIHLVS PTS I SGVELRS 

orfll9-l AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
250 260 270 280 290 300 

25 

310 320 330 340 350 360 

orf 119ng AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

orf 11 9-1 AVTGVGFVLEDDGAFHYTDTSGSTMF3ICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
30 310 320 330 340 350 360 

370 380 390 400 410 420 

orfll9ng GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

35 orf 11 9-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 

429 

orfll9ng KTALRLFSX 

40 III 

orfll9-l KTALRLFSX 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

45 Example 64 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 53 1> 

1 ..GCGCGGCACG GCACGGAAGA TTTCTTCATG AACAACAGCG ACAC.ATCAG 

51 GCAGATAGTC GAAAGCACCA CCGGTACGAT GAAGCTGCTG ATTTCCTCCA 

101 TCGCCCTGAT TTCATTGGTA GTCGGCGGCA TCGGCGTGAT GAACATCATG 

50 151 CTGGTGTCCG TTACCGAGCG CACCAAAGAA ATCGGCATAC GGATGGCAAT 

201 CGGCGCGCGG CGCGGCAATA TTTyGCAGCA GTTTTTGATT GAGGCGGTGT 

251 TAATCTGCGT CATCGGCGGT TTGGTCGGCG TGGGTTTGTC CGCCGCCGTC 

301 AGCCTCGTGT TCAATCATTT TGTAACCGAC TTCCCGATGG ACATTTCCGC 

351 CATGTCCGTC ATCGGCGCGG TCGCCTGTTC GACCGGAATC GGCATCGCGT 

55 401 TCGGCTTTAT GCCTGCCAAT AAAGCAGCCA AACTCAATCC GATAGACGCA 

451 TTGGCACAGG ATTGA 

This corresponds to the amino acid sequence <SEQ ED 532; ORF134>: 

1 . .ARHGTEDFFM NNSDXIRQIV ESTTGTMKLL ISSIALISLV VGGIGVMNIM 

51 LVSVTERTKE IGIRMAIGAR RGNIXQQFLI EAVLICVIGG LVGVGLSAAV 

60 101 SLVFNHFVTD FPMDISAMSV IGAVAC3TGI GIAFGFMPAN KAAKLNPIDA 

151 LAQD* 
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Further work revealed the complete nucleotide sequence <SEQ ID 533>: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCGTCGGT GGTTTCCGTC GTCGCATTGG 

101 GCAATGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC GGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAACA CCGACCTGAC CGCCTCGCTT TACGGCGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGACTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

401 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGACG 

1151 CATTGGCACA GGATTGA 

This corresponds to the amino acid sequence <SEQ ID 534; ORF134-l>: 



1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with the hypothetical protein o648 of E. coli (accession number AEOQQl 89) 
ORF134 and o648 protein show 45% aa identity in 153aa overlap: 

Orfl34: 2 RHGTEDFFMNNSDXIRQIVESTTGTMKXXXXXXXXXXXWGGIGVMNIMLVSVTERTKEI 61 

RHG +DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EI 
0648: 496 RHGKKDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLWGGIGVMNIMLVSVTERTREI 555 

Orfl34: 62 GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAMSVI 121 

GIRMA+GAR ++ QQFLIEA F+ + + S ++++ 

0648: 556 GIRMAVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALL 615 

Orfl34: 122 GAVACSTGIGIAFGFMPANKAAKLNPIDALAQD 154 

A CST GI FG++PA AA+L+P+DALA++ 
o648: 616 LAFLCSTVTGILFGWLPARNAARLDPVDALARE 648 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF134 shows 98.7% identity over a 154aa overlap with an ORF (ORF134a) from strain A of 
meningi-idis: 



ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 
GESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTEDFFMNNSDSIRQIVESTTGTMKLL 
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orf 134 . pep ISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 
orfl34a ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICVIGG 



orf 134 . pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I M I I I I I I I I M 
orf 134a LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 



orf 134. pep LAQDX 
I I I I I 

orfl34a LAQDX 

The complete length ORF134a nucleotide sequence <SEQ ID 535> is: 



1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAACGGTTC 
AACACCATCA 
CAGGATTAAA 
GCTACGTTGC 
TACCGCAATA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AAGCGCGGCA 
AGGCAGATAG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCCATGTCCG 
GTTCGGCTTT 
CATTGGCGCA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
TTCCGCCACG 
CCGACCTGAC 
CGCGGGCTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCAGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
GTCATCGGCG 
GTTCAATCAT 
TCATCGGCGC 
ATGCCTGCCA 
GGATTGA 



GGCGCACAAA 
TCGCTTCGGT 
ATCCTTGAAG 
AGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCTTCTTTG 
AGCTGGAARC 
CAGGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GATTTCTTCA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GTTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
ATAAAGCAGC 



ATGCGTTCGC 
TGTCTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAATCATC 
CGAGCGGCGG 
TACGGTGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGACC 
TGAACAACAG 
ATGAAGCTGC 
CATCGGCGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTGGGTTTG 
ACTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACGAT 
GTCGCATTGG 
GATAGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACT 
GCGAACAATA 
TTTGACGAAA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GATCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
ATGAACATCA 
ACGGATGGCA 
TTGAGGCGGT 
TCCGCCGCCG 
GGACATTTCC 
TCGGCATCGC 
CCGATAGATG 



This encodes a protein having amino acid sequence <SEQ ID 536>: 



MSVQAVLAHK MRSLLTMLGI IIGIASWSV 
NTISIFPGRG FGDRRSGRIK TLTIDDAKII 
YRNTDLTASL YGVGEQYFDV RGLKLETGRL 
DKLFADSDPL GKTILFRKRP LTVIGVMKKD 
HQITGESHTN SITVKIKDNA NTQVAEKGLT 
RQIVESTTGT MKL LI55IAL ISLWGGIGV 
IGARRGNILQ Q FLIEAVLIC VIGGLVGV GL 
AMSVIGAVAC STGIGIAFGF MPANKAAKLN 



ORF134a and ORF134-1 show 100.0% identity in 3 



VALG NGSQKK 
AKQSYVASAT 
FDENDVKEDA 
ENAFGNSDVL 
DLLKARHGTE 
MNIMLVSVTE 
SAAVSLVFNH 
PIDALAQD* 

3 aa overlap: 



ILEDISSIGT 
PMTSSGGTLT 
QVWIDQNVK 
MLWSPYTTVM 
DFFMNNSDSI 
RTKEIGIRMA 
FVTDFPMDIS 



orf 134a. pep MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

orf 134-1 MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

orf 134a . pep FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orf 134a. pep RGLKLETGRLFDENDVKEDAQWVIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

orfl34-l RGLKLETGRLFDENDVKEDAQVWI DQNVKDKLFADS DPLGKTI LFRKRPLTVIGVMKKD 

orf 134a . pep ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 

orf 134-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 
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orfl34e 
orfl34- 



orfl34E 
orfl34- 



DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

DFFMNNSDSIRQIVESTTGTMKLLISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMA 

IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

STGIGIAFGFMPANKAAKLNPIDALAQDX 

STGIGIAFGFMPANKAAKLNPIDALAQDX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF134 shows 96.8% identity over a 154aa overlap with a predicted ORF (ORF134.ng) from N. 
gonorrhoeae: 

orfl34.pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 30 

orfl34ng GESHTNSITVKIKDNANTRVAEKGLAELLKARHGTEDFFMNNSDSIRQMVESTTGTMKLL 264 

orf 134 .pep ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 90 
orfl34ng ISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRmiGARRGNILQQFLIEAVLICIIGG 324 

orf 134 .pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 
orfl34ng LVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 384 

orf 134. pep LAQD 154 
orfl34ng LAQD 388 



The complete length ORF134ng nucleotide sequence <SEQ ID 537> is: 



1001 
1051 
1101 
1151 



RTGTCGGTGC 
GCTCGGCATC 
GCAACGGTTC 
AACACCATCA 
CAAAATCAAA 
GCTACGTTGC 
TACCGCAATA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AAGCACGGCA 
AGGCAGATGG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCGGCATCCG 
GTTCGGCTTT 
CATTGGCGCA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
CTCCGCCACG 
CCGACCTGAC 
CGCGGGCTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCGGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
ATCATCGGAG 
GTTCAATCAT 
TTATCGGGGC 
ATGCCTGCCA 
GGATTGA 



GGCGCACAAA 
TCGCTTCGGT 
ATCCTCGAAG 
CGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCTTCTTTG 
AGCTGGAAAC 
CAAGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GACTTCTTTA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GCTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
ATAAGGCAGC 



ATGCGTTCGC 
TGTCTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAATCATC 
CGAGCGGCGG 
TACGGTGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGGCC 
TGAACAACAG 
ATGAAGCTGC 
CATCGGTGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTAGGTTTG 
ATTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACCAT 
GTCGCGCTGG 
GATGGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACC 
GCGAACAATA 
TTTGATGAGA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GAGCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
ATGAACATTA 
ACGGATGGCA 
TTGAGGCGGT 
TCCGCCGCCG 
GGACATTTCG 
TCGGCATCGC 
CCGATAGATG 



This encodes a protein having amino acid sequence <SEQ ED 538>: 



1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSMGT 

51 NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT PMTS3GGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGN3DVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE DFFMNNSDSI 

251 RQMVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC IIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 
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351 AAS VIGAVflC 5TGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134ng and ORF134-1 show 97.9% identity in 388 aa overlap: 

orfl34ng MSVQAVLAHKMRSLLTMLGIIIGIASWSVVALGNGSQKKILEDISSMGTNTISIFPGRG 

orf 134-1 MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

orfl3 4ng FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orfl34ng RGLKLETGRLFDENDVKEDAQWVIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

orf 134-1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

orfl34ng ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGTE 

orf 134-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 

orfl34ng DFFMNNSDSIRQMVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

orfl34ng IGARRGNILQQFLIEAVLICIIGGLVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVAC 
I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I 

orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orfl34ng STGIGIAFGFMPANKAAKLNPIDALAQDX 

orf 134-1 STGIGIAFGFMPANKAAKLNPIDALAQDX 



30 ORFl 34ng also shows homology to an E. coli ABC transporter: 



= 230/389 (58%), Gaps = 1/389 (0%) 
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Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins Irom N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 65 



The following partial DNA sequence was identified mN. meningitidis <SEQ ID 539>: 

1 . . GGGACGGGAG CGATGCTGCT GCTGTTTTAC GCGGTAACGA T . CTGCCTTT 

51 GGCCACTGGC GTTACCCTGA GTTACACCTC GTCGATTTTT TTGGCGGTAT 

101 TTTCCTTCCT GATTTTGAAA GAACGGATTT CCGTTTACAC GCAGGCGGTG 

151 CTGCTCCTTG GTTTTGCCGG CGTGGTATTG CTGCTTAATC CCTCGTTCCG 

201 CAGCGGTCAG GAAACGGCGG CACTCGCCGG GCTGGCGGGC GGCGCGATGT 

251 CCGGCTGGGC GTATTTGAAA GTGCGCGAAC TGTCTTTGGC GGGCGAACCC 

301 GGCTGGCGCG TCGTGTTTTA CCTTTCCGTG ACAGGTGTGG CGATGTCGTC 

351 GGTTTGGGCG ACGCTGACCG GCTGGCACAC CCTGTCCTTT CCATCGGCAG 

401 TTTATCTGTC GTGCATCGGC GTGTCCGCGC TGATTGCCCA ACTGTCGATG 

451 ACGCGCGCCT ACAAAGTCGG CGACAAATTC ACGGTTGCCT CGCTTTCCTA 

501 TATGACCGTC GTTTTTTCCG CTCTGTCTGC CGCATTTTTT CTGGGCGAAG 

551 AGCTTTTCTG GCAGGAAATA CTCGGTATGT GCATCATCAT CfiTCAGCGGT 

601 ATTTTGA 

This corresponds to the amino acid sequence <SEQ ID 540; ORF135>: 



1 ..GTGAWLLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 
51 LLLGFAGVVL LLNPSFRSGQ ETAALAGLAG GAMSGWAYLK VRELSLAGEP 
101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 
151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 
201 F* 

Further work revealed the complete nucleotide sequence <SEQ ID 54 1>; 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA mCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACTGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCGTCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GGCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 542; ORF135-l>: 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDXFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQA VL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF135 shows 99.0% identity over a 197aa overlap with an ORF (ORF135a) from sfrain A of N. 
meningitidis: 

10 20 30 

orf 135 . pep GTGAMLLLFYAVTILPLATGVTLSYTSSIF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 135a 3TVALGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIF 
50 60 70 80 90 100 

40 50 60 70 80 90 

orf 135. pep LAVFSFLILKERISVYTQAVLLLGFAGWLLLNP3FRSGQETAALAGLAGGAMSGWAYLK 

orf 135a LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
110 120 130 140 150 160 

100 110 120 130 140 150 

orf 135. pep VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALI.AQLSM 

orf 135a VRELSLAGEPGWRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
170 180 190 200 210 220 



150 170 180 190 200 

orfl35.pep TRAYKVGDKFTVASLS YMTVVFSALSAAFFLGEELFWQE I LGMCI 1 1 SAVFX 

orf 135a TRAYKVGDKFTVA3L3YMTVVFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAF 
230 240 250 260 270 280 



orfl35a KQRLQSLFRQRX 
290 300 

The complete length ORF135a nucleotide sequence <SEQ ID 543> is: 

1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

4 51 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCATCGGT TTGGGCGACG 

501 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GCCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 544>: 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTVV 

251 FSALSAAFFL AEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

ORF135a and ORF135-1 show 99.3% identity in 300 aa overlap: 



orf 135a . pep MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 
orf 135-1 MDTAKKDILG3GWMLVAAACFTIMNVLIKEASAKFALG3GELVFWRMLFSTVALGAAAVL 
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orf 135a . pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

orf 135a . pep RISVYTQAVLLLGFAGVVLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 
I I I I I I I I I I I ! I I I 1 ! I I I I i I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf 135a. pep WRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

I I I I I I I I I I I i I I I I ! 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I ! I I 
orf 135-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orf 135a. pep VASLSYMTVVFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

Homology with a predicted ORF fi-om N.sonorrhoeae 

ORF135 shows 97% identity over a 201aa overlap with a predicted ORF (ORF135ng) from 
N. gonorrhoeae: 

orf 135. pep GTGAMLLLFYAVTXLPLATGVTLSYTSSIF 30 

I I I I I I I I I I I i I i i : I I I I I I I I I I I I 
orfl35ng 3TVTLGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIF 335 

orf 135 . pep LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 90 

orfl35ng LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLK 395 

orf 135 .pep VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 150 

orfl35ng VRELSLAGEPGWRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSM 455 

orf 135. pep TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCIIISAVF 201 

I 1 ! I I I I I 1 I I I I I I I I I I I I I I 1 I I I I i I I M I I I I I I I I 1 I I I I I I : I 
orfl35ng TRAYKVGDKFTVA3LSYMTWF3ALSAAFFLGEELFWQEILGMCIIISAAF 506 

An ORF135ng nucleotide sequence <SEQ ID 545> was predicted to encode a protein having amino 
acid sequence <SEQ ID 546>: 

1 MPSEKAFRRH LRTASFQGLH LHHFHQKVGK CGIIGFGIHI FPTLLPA AQG 

51 ILDIQLGLFR IDFAALAVYR RTQVDFIHTV IDGIASDQAF SEWQILRRL 

101 NLGHFTDTHL lAQARRFIAD FGNIRPMRRG EAKTFCRCFR FDGIDGIHGD 

151 FRQCGHINRL APGKDCRNGK RDKVFFHTRH YNQVCLEKTN CSARKIKFRH 

201 QKQAKTHSTS LAARFTIRPS LSQRPFMDTA KKDILGS GWM LVAAACFTVM 

251 NVLIKEASAK FALGSGELVF WRMLFSTVTL GAAAVLRRDT FRTPHWKNHL 

301 NRS MVGTGAM LLLFYAVTHL PLTTGVT LSY TSSIFLAVFS FLIL KERISV 

351 YTQ AVLLLGF AGWLLLNPS F RSGQEPAAL AGLAGGAMSG WAYLKVRELS 

401 LAGEPGWRVV FYLSATGVAM SSVWATLTGW HTLS FPSAVY LSGIGVSALI 

451 AQLSMTRAYK VGDKFTVAS L SYMTVVFSAL SAAFFL GEE L FWQEILGMCI 

501 IISAAF * 

Further work revealed the following gonococcal sequence <SEQ ID 547>: 

1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTCACCGTTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTACGC TCGGTGCTGC CGCCGTATTG CGGCGCGACA CCTTCCGCAC 

2 01 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGAC AACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTttg GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

4 51 CCGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGCAACC GGCGTGGCGA TGTCGTCggt ttgggcgacg 

601 Ctgaccggct ggCACAcccT GTCCTTTcca tcggcagttt ATCtgtCGGG 
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651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

701 aaGTCGGCGA CAAATTCACG GTTGCCTCGC tttcctaTAt gaccgtcGTC 

751 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 GGAAATACTC GGTATGTGCA TCATTAtccT CAGCGGCATT TTGAGCAGCA 

851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCCTCTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 548; ORF135ng-l>: 

1 MDTAKKDILG SGWMLVAAA C FTVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVTLGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLTTGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGVVLL LNPSF RSGQE 

151 PAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSAT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSGIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPIAFK QRLQALFRQR 

301 * 

ORF135ng-l and ORF135-1 show 97.0% identity in 300 aa overlap: 

orfl35ng-l.pep MDTAKKDILGSGWMLVAAACFTVMNVLIKEASAKFALGSGELVFWRMLFSTVTLGAAAVL 

or f 13 5-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 

orf 135ng-l .pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIFLAVFSFLILKE 
I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I 
orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

or f 135ng-l . pep RISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf 135-1 RISVYTQAVLLLGFAGVVLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf 135ng-l . pep WRVVFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSMTRAYKVGDKFT 

orf 135-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orf 135ng-l . pep VASLSYMTVVFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPIAFKQRLQALFRQR 

orf 135-1 VASLSYMTVVFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 66 



The following DNA sequence was identified in N. meningitidis <SEQ ID 549>: 

1 ATGAAGCGGC GTATAGCCGT CTTCGTCCTG TTCCCGCAGA TAATCCGAGT 

51 TTTGGGACAA CTGTTGCCGA AAATCGTCAA TACAGTTCCG GCACATCGGA 

101 TGCTCTTCCA GATTTTCGGG ATGTTCTTTT TCTTCATACA CCAGCAATAT 

151 CTGCCCGGGA TCGCCGAAAT CGATTCCCCA TGCGGCATCG TGTTCGGTGC 

201 GCTCCTCTTC CGTCATCTGC CCGCGCATTG CCTGTATGGT AAAGCCGCCG 

251 TAGGGGATGC CgTTGCACAC GAACATCCAG TCGCTGATGT CGTCAACCGG 

301 AACGCAAACG cTTTCGCCTT GTTCGACATT GGTCAGTTCG CCsGGTTCAT 

351 TGTTCAGCAC ACCGTAAATA TAAAGACCGT CAAAATAAAT ATCGTCGATC 

401 CACATATGTT CGCAAATTTC GCCGTCTTCG CCGTCTTGGA AAAAAGGGAC 

4 51 TTTGACCATG GCAAAATCCA AGGCGGAAAT AATGCGGCGG CGTTCCCAAA 

501 AAAGcTCGCG CCAAAAATAT TTGAATGTTT TACGGGCGCG TTCGTCGGCA 

551 CGGTTTACCG GTTCGTCTGC CTGTTCTACA TAATAAATGA CGGAATCGCC 

601 CATCATATCT GCTCCTCAAC GTGTACGGTA TCTGTTTGCA CCTTACTGCG 

651 GCTTTCTgcC kTCGGCATCC GATTCGGATT TGAAAAGTTC iranrwyATTCG 

701 GAATAG 

This corresponds to the amino acid sequence <SEQ ID 550; ORF136>: 

1 MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 

51 LPGIAEIDSP CGIVFGALLF RHLPAHCLYG KAAVGDAVAH EHPVADWNR 
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101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 
151 FDHGKIQGGN NAAAFPKKLA PKIFECFTGA FVGTVYRFVC LFYIINDGIA 
201 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE* 

Further work revealed the complete nucleotide sequence <SEQ ID 551>: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

4 01 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

7 01 CGGAATAG 

This corresponds to the amino acid sequence <SEQ ED 552; ORF136-l>: 

1 MMKRR IAVFV LFPQIIRVLG QL LPKIVNTV PAHRMLFQIF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVGDAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYII NDGI 

201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

Computer analysis of this amino acid sequence gave the following results: 
Homologv with a predicted ORF from N.meninsitidis fstrain A) 

ORF136 shows 71 .7% identity over a 237aa overlap with an ORF (ORF136a) from sfrain A of A^. 
meningitidis: 

10 20 30 40 50 59 

orfl36.pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

orfl36a MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
10 20 30 40 50 60 

50 70 80 90 100 110 119 

or f 13 6. pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 

or f 13 6a PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADVVNRNANAFALFDIGQFAGFIVQ 
70 80 90 100 110 120 

120 130 140 150 150 170 179 

orfl36.pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

or f 136a HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 
130 140 150 160 170 180 

180 190 200 210 220 230 

orfl36.pep AFVGTVYRFVCLFYIINDGIAHH — SAPQRVRYLFAPYCGFLPSASDSDLKSSXXSEX 

orfl36a R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

The complete length ORF 136a nucleotide sequence <SEQ ID 553> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC NGTCCACGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 
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301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACGCCATAA ATGTAAAGAC CGTCAAAATA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCNTCT TCGCCGTCTT GGAAAAAAGG 

451 GCTTTGACCA TGGCAAAATC TAAGGNGNNA NNGATGCGGC GGCGTTCCCA 

501 AAAAAGCTCG CGCCAAAAAT ATTTGAATGT TTTGCGGGCG CGTTCGCCGG 

551 CACGGTTTAC CGGTTTGTCT GCCTGTTCTA CATAATAAAT GACGGAATCG 

601 CCCATCATAT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 554>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGTLL FRHXSTHCLY GKAAVGNAVA HEHPVADVVN 

101 RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAXFAVLEKR 

151 ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 

201 PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

ORF136a and ORF136-1 show 73.1% identity in 238 aa overlap: 

10 20 30 40 50 60 

orfl36a.pep MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
I I i I I I I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 136-1 iXIMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

10 20 30 40 50 60 



orfl36a.pep 



PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 
PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADVVNRNANAFALFDIGQFAGFIVQ 



HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 
HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 



R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

AFVGTVYRFVCLFYIINDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 



Homology with a predicted ORf from N.eonorrhoeae 

ORF136 shows 92.3% identity over a 234aa overlap with a predicted ORF (ORF136ng) from 
N. gonorrhoeae: 



orf 136. pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 

or f 1 3 6 . pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADVVNRNANAFALFDIGQFAXFIVQ 

orfl3 6ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 

orfl3 6.pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
Orfl36ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 

orf 13 6. pep AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSXXSE 2 34 

orfl3 6ng AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSE 235 

The complete length ORF136ng nucleotide sequence <SEQ ID 55 5> is: 



1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 
51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 
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101 GGATGCTCTT CCAAATTTTC GGGATGTTCT TTTTCTTCAT ACACCGGCRA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCAGGCGGTA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC TGTCCGCGCA TTGCCTGTAC GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGCCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT CCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

401 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAG TATTTGAATG TTTTACGGGC GCGTTCGCCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATA CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACCG 

651 CGGTTTTCTA CCTCCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 556>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVGDAVA HEHPVADVAN 

101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYII NDGI 

201 AHHTAPQRVR YLFAPYRGFL PPASDSDLKS SKYSE* 

ORF136ng and ORF136-1 show 93.6% identity in 235 aa overlap: 





:fl36ng 


MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 




•■fl36-l 


MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 




:fl36ng 


PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 
PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 




:fl36-l 




:fl36ng 


HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 




:fl36-l 


HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 




:fl36ng 


AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX 




:fl36-l 


AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 



Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 67 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 557>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC . TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

2 01 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 CGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC ATCAACCGAA 

401 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC . . 

This corresponds to the amino acid sequence <SEQ ID 558; ORF137>: 

1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGNLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 



Further work revealed the complete nucleotide sequence <SEQ ID 559>; 
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1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGGGCA AAAACATCAG CCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This corresponds to the amino acid sequence <SEQ ID 560; ORF137-l>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRQGA NFVIAVDISA RPGKNISQGF FSYLDQTLNV MSVSALQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

Computer analysis of this amino acid sequence gave the following results; 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF137 shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) from sfrain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orfl37 .pep MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 
I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I : I I I I I I i ! I I I I M I I I I I I I I I I I I I 
orfl37a MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 137 . pep VGIIKVLKENGIPVKVVTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 

orfl37a VGIIKVLKENGIPVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 
70 80 90 100 110 120 



orf 137 .pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 

orf 137a FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
130 140 150 160 170 180 

The complete length ORF137a nucleotide sequence <SEQ ID 561> is: 



1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

3 01 TTGGAAGCCG AAATTTTAGG TAAAACCGAT TTGGTCGATT TAACCTTGTC 
351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AAGTCGGCGG CAGGCGGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 
4 51 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 
501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 
551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 
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601 CCCGTCAGTG CCGCCCGGCG GCANGNNNNG NATNTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGAGCA AAAACATCAG CCAAGGCTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CCGCGTTGCA AAATGAGTTG 

7 51 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 562>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAARKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRRI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRXXX XXVIAVDISA RPSKNISQGF FSYLDQTLNV MSVSALQNEL 

251 GQADVVIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137a and ORF137-1 show 97.3% identity in 300 aa overlap: 



orf 137a . pep MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 

orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orf 137a . pep VGIIKVLKENGIPVKWTGTSAG3IVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTL3TSG 

orf 137a . pep FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orf 137a . pep FQPVIIGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV 

orfl37-l FQPVI IGRHT YVDGGLSQPVPVSAARRQGANFVIAVDI SARPGKNISQGFFSYLDQTLNV 

orf 137a . pep MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

orfl37-l MS VS ALQNELGQAD VVI KPQVLDLGAVGGFDQKKRAI RLGEEAARAALPE IKRKLAAYRY 



Homology with a predicted ORF from N. gonorrhoeae 

ORF137 shows 89.9% identity over a 149aa overlap with a predicted ORF (ORF137ng) from 
N.gonorrhoeae: 

orf 137 .pep MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 60 

1 I I I I I I I I 1 I : I I I I I I t I I I I I I I I I I : I I I I I I I I I I ! i I : I I I I I I I I I I I I 
orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 60 

orf 137 .pep VGIIKVLKENGIPVKWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 120 

: I I : I I I I I I I I I I I I I I I I I I I I I 1 I : I : 1 I I I ! I I ! I I 1 I I I ! I I I I I I I I I I I I I : I 
orfl37ng IGIVKVLKENGIPVKVVTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 120 

orfl37.pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 149 

orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 180 

The complete length ORF137ng nucleotide sequence <SEQ ID 563> is: 



ATGGAAAATA 
CGCCGCCGCG 
GCAAGCCGGT 
GGTGGCGGCG 
GAAAGAAAAC 
CGATAGTCGG 
TTGGAAGCCG 
CACCAGTGGT 
AAGTCGGCGG 
GCCACTGATT 



TGGTAACGTT 
TTGCTTGCCG 
GCAAACCGCC 
CATCTAAAGG 
GGTATTCCTG 
CAGCCTTTTG 
AGATTTTAGG 
TTTATCAAAG 
CAGGCAGATT 
TTGAAACCGG 



TTCAAAAATC 
CCTGCGGTAC 
AAACCCGCCG 
ATTTGCCCAT 
TGAAGGTGGT 
GCATCGGGTA 
TAAAACCGAT 
GCGAAAAGCT 
CAGCAGTTTC 
CAAGGCCGTC 



AGATCATTTT 
GGCGGGAAAC 
CAGTGGTCGC 
ATAGGAATTG 
TACCGGCACA 
TGTCGCCCGA 
TTAGTCGATT 
GCAAAATTAC 
CCATCAAATT 
GCTTTCAATC 



TGGCAATCGC 
AATGCCGCCC 
TTTGGCACTC 
TTAAGGTTTT 
TCGGCAGGTT 
CCGCCTCGAA 
TAACCTTGTC 
ATCAACCGAA 
TGCCGCCGTT 
AAGGGAATGC 
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5 



501 


CGGGCAGGCG 


GTTCGTGCTT 


CCGCCGCCAT 


TCCCAATGTG 


TTCCAGCCAG 


551 


TCATCATCGG 


CAGGCACAAA 


TATGTTGACG 


GCGGTCTGTC 


GCAGCCCGTG 


601 


CCCGTCAGTG 


CCGCTCGGCG 


GCAGGGGGCG 


AATTTCGTGA 


TTGCCGTCGA 


651 


TATTTCCGCA 


CGTCCGAGCA 


AAAATGTCGG 


TCAAGGTTTC 


TTCTCTTATC 


701 


TCGATCAGAC 


GCTGAACGTG 


ATGAGCGTTT 


CCGTGTTGCA 


AAACGAGTTG 


751 


gggcAGGCGG 


ATGTGGTTAT 


CAAACCGCag 


gtTTTGGATT 


TGGGTGCAGT 


801 


CGGCGGATTC 


GATCAGAAAA 


AGCGCGCCAT 


CCGGTTGGGC 


GAGGAGGCAG 


851 


CACGTGCCGC 


ATTGCCTGAA 


ATCAAACGCA 


AACTGGCGGC 


ATACCGTTAT 


901 


TGA 











10 This encodes a protein having amino acid sequence <SEQ ID 564>: 

1 MENMVTFSK I RSFLAIAAAA LLAAC GTAGN NAARKPVQTA KPAAWALAL 

51 GGGASKGFAH IGIVKVLKEN GIPVKVVTGT SAGSIVGSLL ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHK YVDGGLSQPV 

15 201 PVSAARRQGA NFVIAVDISA RPSKNVGQGF FSYLDQTLNV MSVSVLQNEL 

251 GQADVVIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137ng and ORF137-1 show 96.0% identity in 300 aa overlap: 

orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAVVALALGGGASKGFAH 
20 I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I : I I I I I I I I I I I I I 

orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAVVGLALGGGASKGFAH 

orfl37ng IGIVKVLKENGIPVKVVTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 

25 orf 137-1 VGIIKVLKENGIPVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

30 

orfl37ng FQPVIIGRHKYVDGGLSQPVPVSAARRQGANFVIAVDISARPSKNVGQGFFSYLDQTLNV 

orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

35 orfl37ng MSVSVLQNELGQADVVIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

orf 137 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underhned) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
40 N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 68 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 565>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

45 51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

50 301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTC . . 

This corresponds to the amino acid sequence <SEQ ID 566; ORF138>: 



55 



1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 
51 KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 
101 MFKAVHGWEH VQQALDKHEG LLF 
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Further work revealed the complete nucleotide sequence <SEQ ID 567>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGM AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

7 51 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 568; ORF138-l>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

Computer analysis of this amino acid sequence gave the following results: 
Homologv with a predicted ORF from N.meninsitidis (strain K) 

ORF138 shows 99.2% identity over a 123aa overlap with an ORF (ORF138a) from strain A of A^. 
meningitidis: 

10 20 30 40 50 60 

or f 138. pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPL3CLHTLGNRLGHLAFYLLKEDRARIVAX 

or f 138a MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 



)rf 138 . pep MRQAGLNPDPKTVICAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
)rfl38a MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 



orfl38.pep LLF 
I I I 

orfl38a LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
130 140 150 160 170 180 

The complete length ORF138a nucleotide sequence <SEQ ID 569> is: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGG7\AAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGTCAGG CAGGCATGAA 

201 TCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 



wo 99/24578 



-327- 



PCT/IB98/01665 



601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

751 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This encodes a protein having amino acid sequence <SEQ ED 570>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLS CLHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

ORF138a and ORF138-1 show 99.7% identity over a 298aa overlap: 



orf 138a . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138-1 MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138a . pep MRQAGMNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 138a. pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

orf 138-1 LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

orf 138a . pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 

orf 138-1 VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 

or:J138a . pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

orf 138-1 CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYICMP 

Homology with a predicted ORF from N.sonorrhoeae 

ORF138 shows 94.3% identity over a 123aa overlap with a predicted ORF (ORF138ng) 
N. gonorrhoeae: 



orfl38.pep 

orfl38ng 

orfl38.pep 

orfl38ng 

orfl38.pep 

orfl38ng 



MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 
MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 



LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 1 



The complete length ORF138ng nucleotide sequence <SEQ ID 571 > is: 



1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG TCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

2 01 CCCCGACACG CAGACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAATGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAAAA AACCGGAAGA CATCGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAC GTGCAGCAGG CTTTGGACAA 

351 GGGCGAAGGG CTGCTGTTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

4 01 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCACCTGAC CGCCATGTAC 

4 51 AAGCCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 GCGCGGCAAA GGCAAAACcg cgcccaccgg catACAAGGG GTCAAACAAA 

551 tcatcaAGGC CCTGCGCGCG GGCGAGGCAA CCAtcATCCT GCCCGACCAC 
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601 GTCCCTTCTC CGCRGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 

651 ACCTGCRTAc acCATGACAC TGGCGGCAAA ATTGGCACAC GTCAAAGGCG 

7 01 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC 
751 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 

8 01 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 
851 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 572>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQIIKALRA GEATIILPDH 

201 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

251 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFM YNRYKTP* 

ORF138ng and ORF138-1 show 94.3% identity over 299aa overlap: 

orf 138-1. pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orfl38ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138-1. pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 

orf 138-1. pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

orfl38ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 

orf 138-1 . pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 

orfl38ng VKQIIKALRAGEATIILPDHVPSPQEGG-GVWADFFGKPAYTMTLAAKLAHVKGVKTLFF 
orf 138-1 .pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
orfl38ng CCERLPDGQGFVLHIRPVQGELNGNKAHDAAVFNRNTEYWIRRFPTQYLFMYNRYKTP 

In addition, ORF138ng is homologous to htrB protein from Pseudomonas fluorescens: 

gnll FID 1 6334283 (Y14568) htrB [Pseudomonas fluorescens] Length = 253 
0.8 bits (196), Expect = 9e-15 

= 49/151 (32%), Positives = 79/151 (51%), Gaps = 6/151 (3%) 





= 80 


Ident 






101 ^ 






Sbjct: 


94 I 


Query: 


160 I 


Sbjct: 


151 I 


Qu.^ry; 


220 ^ 


Sb j ct : 


209 



V+ K A + +G+ +IK +R G I D P P E G++ FF A 
VQLGNKVAASTKEGILSVIKEVRKGGQVGIPAD--PEPAESAGIFVPFFATQA 208 

KLAHVKGVKTLFFCCERLPDGQGF 250 

+ +F RLPDG G+ 

NMLAGGKAVGVFLHALRLPDGSGY 239 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epitopes, could be usefiil antigens for vaccines or diagnostics, or for raising antibodies. 



ORF138-1 (57kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 14A 
55 shows the results of affinity purification of the GST-fusion protein. Purified GST-fiision protein 
was used to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis 
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(Figure 14B). These experiments confirm that ORF138-1 is a surface-exposed protein, and that it 
is a usefiil immunogen. 



Example 69 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 573>: 

1 . . GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 

51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 

101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 

151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 

201 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 

251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 

301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 

351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 

4 01 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 

451 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 

501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 

551 CGCGGGCGAT GGTGCTG . . 

This corresponds to the amino acid sequence <SEQ ID 574; ORF139>: 

1 ..^WSAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGW YAAPARRSAW 

51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAYPFVAKDV 

101 LSAWDALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 

151 GEFAATLFLS RPEWQTLTTL lYAYLGRAGE DNYARAWL. . 

Further work revealed the complete nucleotide sequence <SEQ ID 575>: 

1 ATGGATGGAC GGCGTTGGGT GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCCCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC AGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT TCCTGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGTGCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTT CTGTATTGTT TTTCCGGGTT CGGGCTGGCG 

601 CTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTGCTGGC GTTTGCGGCG GCGGTGTTGT 

851 CTGTGTGCTG CCTGTTTCCT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT TTGCGCTTCT CGGCGGCGGG GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGCAGT GGACGGCTTC GTTGCCGTTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCGACAT TGTTTCTGTC GCGTCCGGAA TGGCAGACGC TGACGACTTT 

1401 GATTTATGCC TATTTGGGAC GCGCGGGTGA GGATAATTAC GCGCGGGCGA 

1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT TTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACAGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 576; ORF139-l>: 

1 MDGRRWWWG AFALLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 
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201 LLLGG3RYAT VEVEIYQLVM FELDMRV ASV LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFA A AVLSVCCLFP hLhlW KANS 

301 AGESWRVLME SETWQAVWNT LRFS AAAVYA AAVLGWYAA A ARRSAWMRG 

351 LMF LPFMVSP VCVSAGVLLL YPQWTAS LPL LLAMYALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA WXTLLL AAFALGIFLL 

501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF139 shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) from strain A of M 
meningitidis: 



AWSAGESWRVLMESETWHAVWNTLRFSAAA 
I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 
QSVGEYVLLAF AAAVXSVCCLFXLLAIW KAWSAGESWRVLMESETWQAVWNTXRFSAAA 



VYAAAVLGWYAAP ARRSAWMRGLMF XPFMVSPVCVSAGVLLL YPQWTAS LPLLLAMYAL 
VYAAAVLGWYAAA ARRSAWMRGLMF LPFMVSPVCVSAGVLLL XPQWTAS LPLLLAMYAL 



)rf 139 .pep LAYPFVA KDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 
>rfl39a LAYPFVA KDVLSAXDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 



)rfl3 9.pep GEFAATLFLSRPEWQTLTTLIYAYLGHAGEDNYARRMyL 
I I I I M I I II I I I I I I I I I I I I II I I I I I I I I I I I 
)rfl3 9a GEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNYARAM VLTLLLAAFALGXFLLL DGGEGG 



The complete length ORF139a nucleotide sequence <SEQ ID 577> is: 



1 ATGGATGGAC GGCGTTGGGC 

51 GGCTTTTTTG GCGGCAATGG 

101 ATGACGGTTT GGCGTGGCGC 

151 CGTTTGGCGT GGACGGTATT 

201 GCCTTTGGGC GTGCCTGTCG 

251 GGCGGGCTTT GGTGCTGCGC 

301 TTGGTGGCGG GCGTGGGCGT 

351 GTGGCGCGGC TGGCAGGATA 

401 TTTTTNACCT TCCTGTGTTG 

451 GTGCCTGCGG CACGGCTTCA 

501 GCGGCGGTTT TGGGACATTG 

551 GCGGCGTGTG CCTTGTCTTC 

601 TTGCTGCTGG GCGGCAGCCG 

651 GTTGGTCATG TTCGAACTCG 

701 TGGTGTNGGG GGTAACNGCG 

751 AGGCGCGCGG TTTCGGATAA 

801 GCAGTCGGTC GGGGAATATG 

851 CTGTGTGCTG CCTGTTTCNT 

901 GCCGGCGAAT CGTGGCGTGT 

951 GTGGAATACT NTGCGCTTCT 

1001 TGGGTGTGGT GTATGCGGCG 

1051 CTGATGTTTT TGCCGTTTAT 

1101 GCTGCTGCTT NATCCGCAGT 

1151 TGTATGCGCT GCTGGCGTAT 

1201 TGNGATGCAC TGCCGCCGGA 

1251 AAACGGCTTT CAGACGGCAT 

1301 CGTTGCGGCG CGGTCTGACT 

1351 GCGGCAACCT TGTTCNTGTC 



GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 
TCGTTGCGCC TTTGTGGGCG' GTGGCGGCGT 
GCGGTGCTGT CGGATGCCTA TATGCTCAAA 
TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 
CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 
CTGCTGATGC TGCCTTTTGT GATGCCCACG 
GCTGGCTCTG TTCGGGGCGG acggcctgtn 

cgccgtatct gttgttgtac ggcaatgtgt 
gtcagggcgg catatcaggg gtttgtgcaa 
gacggcacng acattgggcg cgggggcgtg 
aaatgcccgt tttgcgcccg tggcttgccg 
ctgtattgtt tttcggggtt cgggctggca 
ttatgccacg gtcgaagtgg aaatttacca 
atatggcggt tgcttcggtg ctngtgtggc 
gcggcagggt tgctgtatgc gtggttcggc 
ggcngtttcc cctgtgatgc cgtcgccgcc 
tgctnctggc gtttgcggcg gcggtgtngt 
ttgttggcaa ttgttgtgaa agcgtggtcg 
gttaatggaa agtgaaacgt ggcaggcggt 
cggcggcggc ggtgtatgcg gcggcggttt 
gcggcgcggc ggtcggcgtg gatgcgcggg 
ggtgtcgccg gtttgtgttt cggcgggcgt 
ggacggcttc gttgccgctg ctgctggcga 
ccgtttgtgg caaaagatgt tttatcagcc 
ttacggcagg gcggcggcgg gtttgggtgc 
gccgcatcac gttccccctc ttgaaaccgg 
ttggcggcgg caacctgcgt gggcgaattt 
gcgtcncgag tggcagacgc tgacgacttt 
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14 01 GATTTATGCC TATNTGGGAC GCGCGGGTGA NGATAATTAC GCGCGGGCGA 
14 51 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT NTTCCTGCTG 
1501 TTGGACGGCG GCGAAGGCGG AAAACGGACG GAAACGTTAT AA 

This encodes a protein having amino acid sequence <SEQ ID 578>: 

1 MDGRRWAVWG AFALLPSAFL AAMWAPLWA VAAYDGLAWR AVLSDAYMLK 
51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 



VPAARLQTAX 
LLLGGSRYAT 
RRAVSDKAVS 
AGESWRVLME 
LMF LPFMVSP 



FGA DGLXWRG 
TLGAGAWRRF 
VEVEIYQLVM 
PVMPSPPQSV 
SETWQAVWNT 
VCVSAGVLLL 



XDALPPDYGR 1 
AATLFXSRXE i 
LDGGEGGKRT I 



WQDTPYLLLY GNVFFXLPVL VRAAYQGFVQ 
WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 
FELDMAVA SV LVWLVXGVTA AAGLL YAWFG 
GEYVLLAFA A AVXSVCCLFX LLAIVV KAWS 
XRFS AAAVYA AAVLGVVYAA A ARRSAWMRG 
XPQWTAS LPL LLAMYALLAY PFVA KDVLSA 
QTACRITFPL LKPALRRGLT LAAATCVGEF 
YXGRAGXDNY ARAMVLTLLL AAFALGXFLL 



ORF139a and ORF139-1 show 96.5% homology over a 514aa overlap: 



11:111 

MDGRRWWWGAFALLPSAFLAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
I . pep ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLXWRG 
ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 



orf 139a . pep MDGRRWAVWGAFALLPSAFLAAMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
orfl39-: 
orfl39a 
orfl39-l 

orf 139a . pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWDIEMPVLRP 
orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 
orf 139a . pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVXGVTA 
orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 
orf 139a . pep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIWKAWS 
AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 
AGESWRVLMESETWQAVWNTXRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 
AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 



.rfl39-l 

>rfl39a.p6 

.rfl39-l 



orf 139a. pep VCVSAGVLLLXPQWTASLPLLLAMYALLAYPFVAKDVLSAXDALPPDYGRAAAGLGANGF 

orf 13 9-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orf 13 9a. pep QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNY 

orf 139-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orf 139a . pep ARAMVLTLLLAAFALGXFLLLDGGEGGKRTETLX 



I I I 



I I I I I I I 



I I 



I I I I I 



orf 139-1 ARAMVLTLLLAAFALGIFLLLDGGEGGKQTETLX 

Homology with a predicted ORF from N.sonorrhoeae 

ORF139 shows 95.2% identity over a 189aa overlap with a predicted ORF (ORF139ng) from 
N.gonorrhoeae: 

orf 139. pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 30 

orfl39ng QSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWSAGESRRVLMESETWQAVWNTLRFSAAA 327 

orf 139 . pep VYAAAVLGWYAAPARRSAWMRGLMFXPFMVSPVCVSAGVLLLYPQWTASLPLLLAMYAL 90 

orfl39ng VFAAAVLGWYAAAARRLVWMRGLVFLPFMVSPVCVSAGVLLLYPGWTASLPLLLAMYAL 387 
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orf 139 . pep LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 150 

orfl39ng LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 4 47 

orfl39.pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 189 

I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I 

orfl39ng GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVLTLLLSAFAVCIFLLLDNGEGG 507 

The complete length OKF139ng nucleotide sequence <SEQ ID 579> is predicted to encode 
protein having amino acid sequence <SEQ ID 580>: 



1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMP VLRP WLAGGVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAGASA LVWLVLGVTA AAGLLYAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFSV AVLSVCCLFP LSAIWKAWS 

301 AGESRRVLME SETWQAVWNT LRFSAAAVFA AAVLGVVYAA AARRLVWMRG 

351 LVFLPFMVSP VCVSAGVLLL YPGWTASLPL LLAMYALLAY PFVAKDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 581>: 



1 ATGGATGGAC GGTGTTGGGC GGTACGGGGT GCTTTTTCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTGTT TCAGGCGGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTCCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCGTTTGT GATGCCCACG 

301 CTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC CGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT GCCCGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGCTCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGTTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTTATG TTCGAACTCG ATATGGCGGG GGCTTCGGCG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCCGTGATGC CGTCGCCGCC 

801 GCAATCGGTG GGGGAATATG TATTGCTGGC ATTTTCGGTG GCGGTGTTGT 

851 CCGTGTGCTG CCTGTTTCCT TTGTCGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGCGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCAGT 

951 GTGGAATACt ttGCGCTTTT CGGCGGCGGC GGTGTTTGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGCTGGTGTG GATGCGCGGA 

1051 CTGGTGTTTT TACCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGGGGT GGACGGCTTC GTTACCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCGGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCAG GTTTGGGCGC 

1251 AAACGGCTTT CAGACGGCAT GCCGTATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CGACGTGTGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCCTGTC GCGTCCGGAA TGGCAGACGT TGACGACTTT 

14 01 GATTTATGCC TATTTGGGGC GTGCGGGTGA GGACAATTAT GCGCGGGCAA 

1451 TGGTGTTGAC ATTGCTGTTG TCGGCATTTG CGGTGTGCAT TTTCCTGCTG 

1501 TTGGACAACG GCGAAGGCGg aaaACGGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 582; ORF139ng-l>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVL ARL AFPGRALVLR LLMLP FVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAGA SA LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

301 AGESRRVLME SETWQAVWNT LRFS AAAVFA AAVLGWYAA AA RRLVWMRG 

351 LVF LPFMVSP VCVSAGVLLL YPGWTASL PL LLAMYALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAM VLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 
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ORF139ng-l and ORF139-1 show 95.9% identity over 513aa overlap: 

orfl3 9ng MDGRCWAVRGAFSLLPSAFLAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

or f 13 9-1 MDGRRWWWGAFALLPSAFLAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

orfl39ng ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orf 139-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orfl39ng RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orfl39ng WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAGASALVWLVLGVTA 

orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orfl39ng AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWS 

orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orfl39ng AGESRRVLMESETWQAVWNTLRFSAAAVFAAAVLGWYAAAARRLVWMRGLVFLPFMVSP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I : I I I I I : I I I I I I I I 
orf 139 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 

orfl39ng VCVSAGVLLLYPGWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orf 139-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orfl39ng QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orf 139-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFL3RPEWQTLTTLIYAYLGRAGEDNY 

orfl39ng ARAMVLTLLLSAFAVCIFLLLDNGEGGKRTETL 

orfl39-l ARAMVLTLLLAAFALG I FLLLDGGEGGKQTETL 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (iinderlined) in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 70 

The following partial DNA sequence was identified mN. meningitidis <SEQ ID 583>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 AACGTTTGGT C. . . 

This corresponds to the amino acid sequence <SEQ ID 584; ORF140>: 



Further work revealed the complete nucleotide sequence <SEQ ID 585>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA AAAACTTCGG 



wo 99/24578 



-334- 



PCT/IB98/01665 



201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

3 01 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

7 51 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 

851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGTCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 586; ORF140-1>: 



1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND ILVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVA5 LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPAK AGTV VAIMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF140 shows 95.4% identity over a 87aa overlap with an ORF (ORF140a) from strain A of iV. 
meningitidis: 

10 20 30 40 50 60 

orfl40.pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATG LPTGSIVKD 

orfl40a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATG LPTGSIVND 
10 20 30 40 50 60 



ILVKNFGGTL GGVALLVGLGAMLERLV 

VLVKN FGGT L GGVALLVGLGAMLGRLV ETSGGAQSLADAL I RNFGEKRAP FALGVAS L I F 



The complete length ORF 140a nucleotide sequence <SEQ ID 587> is: 



1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



GCCCCATCCG 
GCCAAGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAAA 
CCCCGTCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACCTGGGTTC 
GATTTCCGTA 
GCGGCAGCGC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCATCA 
ATCGGCCCTC 
AGACGGCAAA 
TTGGTCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGT 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGCCT 
GTCAACCAAA 
GTTCGCCATC 



ATTTTACGGC 
CCTTCATCAC 
CGCACCATCC 
CAACGACCTG 
TGCTGATTCC 
ATCAGCGAAA 
AATAATCGGT 
TGTTTGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACC 
TTACCGACTG 
GTCGGTTGCA 
CTTGGACATG 
CCCTCATCGC 
GTCTGA 



GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCGA 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ACTCATCGGC 



This encodes a protein having amino acid sequence <SEQ ID 588>: 

1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQD VLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPAK AGTV VAIMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

ORF140a and ORF140-1 show 99.8% identity over a 461aa overlap: 



MDGWTQTL3AQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 
MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 

ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 1 

VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 1 

GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 1 

GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 6 

ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 2 

ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 2 

VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 3 

VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 2 

RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 3 

RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 3 

FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 4 

FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 4 
FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 4 61 
orfl4 0a FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 4 61 

Homology with a predicted QRF from N.sonorrhoeae 

ORF140 shows 92% identity over a 87aa overlap with a predicted ORF (ORF140ng) f 
N.gonorrhoeae: 



orf 140-1 . pep 

orfl40a 

orf 140-1 .pep 

orfl40a 

orf 140-1 .pep 

orfl40a 

orf 140-1 .pep 

orfl40a 

orf 140-1. pep 

orfl40a 

orf 140-1 . pep 

orfl40a 

orfl40-l .pep 

orfl40a 

orf 140-1. pep 
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orfl40 .pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATGLPTGSIVKD 60 

orfl40ng MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 60 

orfl40.pep ILVKNFGGTLGGVALLVGLGAMLERLV 87 

: I I I I I I I I I I I I I I I I I I I I I I III 
orfl40ng VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 120 

The complete length ORF140ng nucleotide sequence <SEQ ID 589> was predicted to encode a 
protein having amino acid sequence <SEQ ID 590>: 

1 MDGRTOTLSA QTLLGIS AAA IILI LILIVK FRIRALLTLV lASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPAK AGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPA C SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 



20 Further work revealed a variant gonococcal DNA sequence <SEQ ID 591>: 

1 ATGGACGGCC GGACACAGAC GCTGTCCGCG CAAACCTTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 GCGCGCTGCT GACACTGGTC ATCGCCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT CGTCAACGAC GTACTGGTCA AAAACTTCGG 

25 201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGTCTGGGC GCAATGCTCG 

251 GACGTTTGGT AGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCTCCGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT ATTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

30 4 51 TTCGCGCTTG, CCTCCGTCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAGGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCGCCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAGCGACCCG CCGAAAGAAC 

35 701 CTGCCAAAGC AGGAACGGTC GTCGCCGTCA TGCTGATTCC CATGCTGCTG 

7 51 ATTTTCCTGA ATACCGGCGT ATCAGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACTTGGGTTC AGACGGCAAA AATGATCGGT TCGACACCTG 

851 TCGCCCTTCT GATTTCCGTA TTGGCCGCAC TGTTGGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCAC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

40 951 CCCCGCCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGC TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACA GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

45 1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGATATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ATTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTTGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 592; ORF140ng-l>: 

50 1 MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV lASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPAK AGTV VAVMLIPMLL 

55 251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPAC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLiDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

60 ORF140ng-l and ORF140-1 show 96.3% identity over 461aa overlap: 



orfl40ng-l.pep MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 
III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I : I I I I I I I I I I I I I I I I II 
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orfl4 0-l MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 
orfl4 0ng-l.pep VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 
5 orfl4 0-l ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 

orfl40ng-l .pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASVGAFSVMHVFLPPHPGPIAASEFYG 
orfl4 0-l GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 

10 

orfl40ng-l.pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRAIHVPVPELLSGGTQDSDPPKEPAKAGTV 

orf 140-1 ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 

15 orf 140ng-l .pep VAVMLIPMLLIFLNTGVSALISEKLVSADETWV.QTAKMIGSTPVALLISVLAALLVLGRK 

orf 140-1 VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 

orf 14 0ng-l . pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 
20 I I I I I I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I i I I I I 1 I I I I I I I I I I I I I I 

orf 140-1 RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 

orf 140ng-l . pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 

25 orf 1 4 0-1 FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 

orf 140ng-l .pep FWLVGRLLDMDVPTTLKTWTVNQTLIAFIGFALSALLFAIV 

orf 140-1 FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 

30 Furthermore, ORF140ng-l is homologous to an E.coli protein: 

gi 1 882533 (D29579) ORF_o454 [Escherichia coli] >gi 1 1789097 (AE000358) o454; 
This 454 aa ORF is 34% identical (9 gaps) to 444 residues of an approx. 456 ; 
protein GNTP_BACLI SW: P46832 [Escherichia coli] Length = 454 
Score = 210 bits (529), Expect = le-53 

■ ■ 3 = 130/384 (33%), Positives = 194/384 (49%), Gaps = 19/384 (4%) 

ETSGGAQSLADALIRMFGEKRAPFAPGVASLIFGFPIFFDAGLIVMLPIVFATARRMKQD 147 
E SGGA+SLA+ R G+KR A +A+ G P+FFD G I++ PI++ A+ K 
EHSGGAESLANYFSRKLGDKRTIAALTLAAFFLGIPVFFDVGFIILAPIIYGFAKVAKIS 139 

VLPFALASVGAFSVMHVFLPPHPGPIAASEFYGANIGQVLILGLPTAFITWYFSGYMLGK 207 
L F L G +HV +PPHPGP+AA+ A+IG + I+G+ +1 GY K 



Ident: 




Query: 


88 


Sbjct: 


80 




148 


Sbjct: 


140 




208 


Sb j ct : 


199 




258 


Sbjct: 


256 




318 


Sbjct: 


313 




378 


Sbjct: 


371 




438 


Sbjct: 


431 



A VIL+TGAGG+FG VL SG+GKALA+ + + +P+L F+++LALR +QGS 



; +G SH NDSGFW+V + I 



Based on this analysis, including the identification of the presence of a putative leader sequence 
(double-underlined) and several putative transmembrane domains (single-imderlined) in the 
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gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 71 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 593>: 

1 . .GATTTCGGCA TATCGCCCGT GTATCTTTGG GTTGCCGCCG CGTTCAAACA 

51 TTTGCTGTCG CCGTGGGCTG CCGACTCATA CGATGTCGCA CGCTTTGCAG 

101 GCGTATTTTT TGCCGTTATC GGACTGACTT CCTGCGGCTT TGCCGGTTTC 

151 AACTTTTTGG GCAGACACCA CGGGCGCAC . GTCGTCCTGA TTCTCATCGG 

201 CTGTATCGGG CTGATTCCAG TTGCCCATTT CCTCAACCCC GCTGCCGCCG 

251 CCTTTGCCGC CGCCGGACTG GTGCTGCACG GTTATTCTTT GGCTCGCCGG 

301 CGCGTGATTG CCGCCTCTTT TCTGCTCGGT ACGGGCTGGA CGCTGATGTC 

351 GTTGGCAGCA GCTTATCCGG CAGCATTTGC CCTGATGCTG CCCTTGCCCG 

401 TACTGATGTT TTTCCGTCCG . . 

This corresponds to the amino acid sequence <SEQ ID 594; 0RF141>: 



1 . . DFGISPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 
51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 
101 RVIAASFLLG TGWTLMSLAA AYPAAFALML PLPVLMFFRP . . 

Further work revealed the complete nucleotide sequence <SEQ ID 595>: 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACTCATACGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAgCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCAGTTG CCCATTTCCT CAACCCCGCT 

451 GCCGCCGCGT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGCTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCAG CATTTGCCCT GATGCTGCCC 

601 TTGCCCGTAC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCACTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

7 01 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

7 51 TATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACGTTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCCGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCGGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTGAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

14 51 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCTCCT 

1551 GCCCCAAAAT GCGGATGCGC CGCAAGGCTG GCAGACGGTT TGGCAGGGTG 

1601 CGCGTCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT AA 

This corresponds to the amino acid sequence <SEQ ID 596; 0RF141-1>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADSYDAAR 

101 FAGVFFAVIG LTSCGFAGFN FLGRHHGRS V VLILIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 
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251 YHVFGTFGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 W GILGWWML AVLVLLAW P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPVV RSMEASLSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 

551 ENI* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N.meninsitidis (strain A) 

0RF141 shows 95.0% identity over a 140aa overlap with an ORF (ORFHla) from sfrain A of A^. 
meningitidis: 

10 20 30 

orfl41.pep DFGIS PVYLWVAAAFKHLLS PWAADS YDVA 

orfl41a WNPDEPAVYTAVEALAGSPTPLVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAA 



orf 141 .pep R FAGVFFAVIGLTSCGFA GFNFLGRHHGRX VVLILIGCIGLIPVAHF LNPAAAAFAAAGL 
orfl41a R FAGVFFAWGLTSCGFA GFNFLGRHHGRS WLILIGCIGLIPTVHF LNPAAAAFAAfiGL 



orf 141. pep VLHGYSLARRR VIAASFLLGTGWTLMSL AAA YPAAFALMLPLPVLMFF RP 

orf 14 la VLHGYSLARRR VIAASFLLGTGWTLMSL AAA YPAAFALMLPLPVLMFF RPWQSRRL MLTA 



The complete length 0RP14 la nucleotide sequence <SEQ ID 597> is: 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGTTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCT TTGGTTGCCC ATCTGTTCGG 

201 TCAAATCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCCGGCG TGTTTTTCGC CGTTGTCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTC GTCCTGATTC 

4 01 TCATCGGCTG TATCGGGCTG ATTCCGACCG TACACTTTCT CAACCCCGCT 

451 GCCGCCGCGT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGTTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCGG CATTTGCCCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 GATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACATTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCTGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGACG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGCAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGCT 

14 01 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGACA 

14 51 TAGGCGGCGG CGACCTACAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCGCTT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 
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1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAACCGGG 
1651 GAAAATATAT TAAAAACAAC AGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 598>: 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQID FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAVVG LTSCGFA GFN FLGRHHGRS V VLILIGCIGL IPTVHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 

251 DHVFGTFGGV RHIQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 W GILGVVWML AVLVLLAVN P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPVV RSMEASLSPE LKRELSDGIE CIDIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKTG 

551 ENILKTTD* 

0RF141a and 0RF141-1 show 98.2% identity in 553 aa overlap: 



MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 
MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 
LVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAWGLTSCGFAGFN 
LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 
FLGRHHGRSWLILIGCIGLIPTVHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 



GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 
GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 
QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 
QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 
WGILGWWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
WGILGVVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 
FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 



NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 
NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 
CIDIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 
CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 



orf 141a. pep SKFALIRKTGENI 
orfl41-l SKFALIRKIGENI 



Homology with a predicted ORF from N.sonorrhoeae 

0RF141 shows 95% identity over a 140aa overlap with a predicted ORF (0RF141ng) from 
N. gonorrhoeae: 

orfl41.pep DFGISPVYLWVAAAFKHLLSPWAADSYDVA 30 

orfl41ng WNPAE PAVYTAVEALAGS PT PLVAHL FGQT D FGI P PVYLWVAAAFKHLLS PWAAHPYDAA 126 
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orfl41.pep RFAGVFFAVIGLTSCGFAGFNFLGRHHGRXVVLILIGCIGLIPVAHFLNPiW\AFAAAGL 90 

or f 1 4 Ing RFAGVFFAVIGLTSCGFAGFNFLGRHHGRSVVLIHIGCIGLIPVAHFFNPAAAAFAAAGL 18 6 

orfl41.pep VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRP 140 

or f 14 Ing VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTA 24 6 

An 0RF141ng nucleotide sequence <SEQ ID 599> was predicted to encode a protein having amino 
acid sequence <SEQ ID 600>: 



1 MPSEAVSARP LCEYLLHLAI RPFLLTLMLT YTPPDARPPA KTHEKP WLLL 

51 LMAFAWLWPG VFS HDLWHPA EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 

101 PPVYLWVAAA FKHLLSPWAA HPYDAAR FAG VFFAVIGLTS CGFA GFNFLG 

151 RHHGRS WLI HIGCIGLIPV AHF FNPAAAA FAAAGLVLHG YSLARRRVIA 

201 ASFLLGTGWT LMSL AA AYPA AFALMLPLPV LMFF RPWQSR RL MLTAVASL 

251 AFALPLMTV Y PLLLAKTQPA LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 

301 KNLLWFAPPG LPLAVWTVCR TRLFSTDW GI LGIVWMLAVL VLLAF NPQRF 

351 QDNLVWLLPP LALFGAAQLD SLRRGAAAFV NWFG IMAFGL FAVFLWTGFF 

401 AMNYGWPAKL AERAAYFSPY YVPDIDP IPM AVAVLFTPLW LWAI TRKNIR 

451 GRQAVTN WAA GVTLTWALLM TLFL PWLDAA KSHAPWRSM EASFSPELKR 

501 ELSDGIECIG IGGGDLHTRI VWTQYGTLPH RVGDVRCRYR IVRLPQNADA 

551 PQGWQTVWQG ARPRNKDSKF ALIRKIGENI LKTTD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 601>: 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAACCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGCTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGCCG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCAT 

251 TCAAACATTT GCTGTCGCCG TGGGCAGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCTT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTT GTTTTAATCC 

401 ATATCGGCTG TATCGGGCTG ATTCCGGTTG CCCATTTCCT CAATCCcgcc 

451 gccgccgcct tTGCCGCCGC CGGACTGGTG CTGCacggct actcgctgGC 

501 ACGCCGGCGC GTGATtgccg cctctTtccT GCTCGGTACG GGTTGGACGT 

551 TGATGTCGCT GGCGGCAGCT TATCCGGCGG CGTTTGCGCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCtt gGCAAAAACG CAGCCCGCGC TGTTTGCGCA ATGGCTCAAC 

751 TATCACGTTT TCGGTACGTt cggcgGCGTG CGGCAcaTTC AGAggGCatT 

801 Cagtttgttt cactatctgA AAaatctgct ttggttcgca ccgcccgggC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CACGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCATTGT CTGGATGCTT GCCGTTTTGG TGCTGCTCGC 

951 CTTTAATCCG CAGCGTTTTC AAGACAACCT CGTCTGGCTG CTGCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCTTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGGCTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTACTTC AGCCCGTATT ACGTTCCCGA CATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGTT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTCCGTTGC CGCTACCGTA TCGTCCGCCT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 

1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTTG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT TAAAAACAAC AGATTGA 

This corresponds to the amino acid sequence <SEQ ID 602; 0RP141ng-l>: 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPAEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAVIG LTSCGFA GFN FLGRHHGRS V VLIHIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLM3L AAA YPAAFALMLP 

201 LPVLMFF RFW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLN 

251 YHVFGTFGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT VCRTRLFSTD 

301 W GILGIVWML AVLVLLAF NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 
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AFVNWFG IMA FGLFAVFLWT GFFA MNYGWP AKLAERAAYF SPYYVPDIDP 
IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 
DAAKSHAPW RSMEASFSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 
LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
ENILKTTD* 



0RF141ng-l and 0RF141-1 show 97.5% identity in 553 aa overlap: 

orf 141ng-l . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPAEPAVYTAVEALAGSPTP 

orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

orf 141ng-l . pep LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAVIGLTSCGFAGFN 
I i I ! I I I I I I 1 M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 141-1 LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 

orfl41ng-l.pep FLGRHHGRSVVLIHIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

i I I I I I I I I ! I I I I i I I I I I i I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 14 1-1 FLGRHHGRSVVLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

orfl41ng-l .pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

orf 14 1-1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

orf 14 lng-1 . pep QPALFAQWLNYHVFGTFGGVRHIQRAFSLFHYLPCNLLWFAPPGLPLAVWTVCRTRLFSTD 

orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

orf 141ng-l . pep WGILGIVWMLAVLVLLAFNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

orf 14 1-1 WGILGWWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

orf 141ng-l . pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

orf 141-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

orf 14 lng-1. pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASFSPELKRELSDGIE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I j I ! I : I I I I I I I I I I I j I 
orf 14 1-1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 

orf 14 lng-1 . pep CIGIGGGDLHTRIVWTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 

orf 14 1-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orf 14 lng-1. pep SKFALIRKIGENILKTTDX 

I I I I I I I I I I I I I 
orfl41-l SKFALIRKIGENIX 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 72 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 603>; 

1 . . CAATCCGCCA AATGGTTATC GGGCCAAACT CTAGTCGGCA CAGCAATTGG 

51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 

101 CCGGCCGCGC ATTGAAAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 

151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 604; ORF142>: 

1 . . QSilfMLSGQT LVGTAIGIRG QIKLGGNLHY DIFTGRALKK PEFFQSRKWA 
51 SGFQVGYTF* 

Further work revealed the complete nucleotide sequence <SEQ ID 605>: 
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1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TGGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC GATTGGCGGT ACGCCCGATG AGGAAAGTTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CAGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAT 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCGGT GTAAAACTGT GGATGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CTGCGGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGAATATAT CGGTCGCAGT ACGGCAGATT TTMGTTGAA 

501 ATATAAACGC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 TCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGTCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGCGCA TTGAAAAAGC CCGAATTTTT CCAATCAAGG AAATGGGCAA 

1001 GCGGTTTTCA GGTAGGCTAT ACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 606; ORF142-l>: 



1 MDNSGSEATG KYQGNITFSA DNPLGLSDMF YVNYGRSIGG TPDEESFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLG VKLWMRETKS YIDDAELTVQ RRKTAGWLAE 

151 LSHKEYIGRS TADFKLKYKR GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 SAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSJ5AWLS GQTLVGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEFFQSR KWASGFQVG Y TF * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.sonorrhoeae 

ORF142 shows 88.1% identity over a 59aa overlap with a predicted ORF (ORF142ng) from 
N. gonorrhoeae: 



orfl42.pep QSAKVJLSGQTLVGTAIGTRGQIKLGGNLHY 30 

orfl42ng RGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIGIRGQIKLGGNLHY 313 

orfl42.pep DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 

I I I I I i I I i I I I : I 1 : : I I : : I I I I I I : ! 

orfl42ng DIFTGRALKKPEYFQTKKWVTGFQVGYSF 34 2 

The complete length ORF142ng nucleotide sequence <SEQ ID 607> is: 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TTGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC AATTGGCGGT ACGCCCGATG AGGAAAATTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CGGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAC 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCAGT GTAAAACTGT GGACGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CCACAGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGGATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACAC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 CCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGCCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGTGCA TTGAAAAAGC CCGAATATTT TCAGACGAAG AAATGGGTAA 
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1001 CGGGGTTTCA GGTGGGTTRT TCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 608>: 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRSIGG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

5 101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TADFKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF * 

10 The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 
C-terminal end of outer membrane proteins. 

ORF142ng and ORF142-1 show 95.6% identity over 342aa overlap: 

orf 142-1. pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 

15 orfl42ng-l MDNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYA 

orf 142-1. pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 
orfl42ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 



orf 142-1. pep VKLWMRETKSYIDDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKRGTGMKDALRA 
orfl42ng-l VKLWTRETKSYIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRA 
orf 142-1 .pep PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 
orfl42ng-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 
orf 142-1. pep VRGFDGEMSLSAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLVGTAIG 
orfl42ng-l VRGFDGEMSLPAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 
orf 142-1. pep IRGQIKLGGNLHYDIFTGRALKKPEFFQSRKWASGFQVGYTF 
orfl42ng-l IRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 

In addition, ORF142ng is homologous to the HecB protein of E.chrysanthemi: 

gi 1 1772622 (L39897) HecB [Erwinia chrysanthemi] Length = 558 
Score = 119 bits (295), Expect = 3e-26 

Identities = 88/346 (25%), Positives = 151/346 (43%), Gaps = 22/346 (6%) 

Query: 2 DNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYAV 61 

DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
Sbjct: 230 DNSGQKSTGEEQLNGSLALDNVFGLADQWFISAGHS— SRFATSHDAESLQAG 280 

Query: 62 HYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLSV 121 

+ S P+G W +N++ RY + G S F +R+++RD KT ++ 

Sbjct: 281 -FSMPYGYWNLGYNYSQSRYRNTFINRDFPWHSTGDSDTHRFSLSRWFRDGTMKTAIAG 339 

Query: 122 KLWTRETKSYIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRAP 181 

R +Y++ + L RK + ++H + A F Y G + 

Sbjct: 340 TFSQRTGNNYLNGSLLPSSSRKLSSVSLGVNHSQKLWGGLATFNPTYNRGVRWLGSETDT 399 

Query: 182 EEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHTV 241 

+++ E + WT SA P Y S++ Q++ L ++L +GG ++ 

Sbjct: 400 DKSADEPRAEFNKWTLSASYYHPV TDSITYLGSLYGQYSARALYGSEQLTLGGESSI 456 

Query: 242 RGFDGEMSLPAERGWYWRNDLSWQFKP GHQLYLGA-DVGHVSGQSAKWLSGQTLAG 296 

RGF E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 

Sbjct: 457 RGF-REQYTSGNRGAYWRNELNWQAWQLPVLGNVTFMAAVDGGHLYNHKQDNSTAASLWG 515 
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Sbjct: 516 GAVGMTVASRW LSQQVTVGWPISYPAWLQPDTMWGYRVGLSF 558 

On the basis of this analysis, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

5 Example 73 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 609>: 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGCIIACTTG GgCGGACACC 

51 GCCGACATCG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

10 151 ACGAGCAATT GCCGTTGCTG ATGGAACAAT TGTCCGGCAG CGGTAAGGCG 

201 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

301 AAAAGAAATA CCGGCTGCTG ATTAAGAACA AC. 

This corresponds to the amino acid sequence <SEQ ID 610; ORF143>: 

15 1 MRTKWSAVRS CrfMOTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 

51 EQLPLLMEQL SGSGJCALLVD RNGLYLANAN FHHEAAEELG LLAAEVAQME 
101 KKYRLLIKNN . . 

Further work revealed the complete nucleotide sequence <SEQ ID 61 1>: 

1 ATGGAATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

20 51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

25 301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

401 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCTGATT 

451 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

30 551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

601 ACTTTGGTAA GGATTTTATA CCGCCGTTAC AGCAACCGCG TGTAA 

This corresponds to the amino acid sequence <SEQ ID 612; ORF143-l>: 

1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADADEMVSS 

51 EKLLTMDTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

35 101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLLI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAEV 

201 TLVRILYRRY SNRV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 
40 ORF143 shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) from strain A of A^. 
meningitidis: 

10 20 30 

ori'143 . pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFL 

45 Orfl43a GAFYAVS S DXPSAGKTLLHSLLKADADEMVS SEKLLTWAXTAD I DTALNLLYRLQKLEFL 



50 



orfl43.pep 



40 50 60 70 80 90 

YGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
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YGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 



.rfl43.pep 
irfl43a 



VAQMEKKYRLLIKNN 

VAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELT FFPLYIGSTKFILVIGG IPDLGKEA 



The complete length ORF143a nucleotide sequence <SEQ ID 613> is: 



ATGGAATCAA 
TGCCGGTGCA 
CTTTGTTGCA 
GAGAAGCTGC 
CCTGTTGTAC 
GTCATTCAGA 
GAACAATTGT 
GTATCTTGCC 
TGTTGGCGGC 
AAGAACAACC 
CGGTCAGAGC 
TTATTTTGGT 
ACTTTGGTAA 
TGGGAGAGAG 



CANTTTCACT 
TTTTATGCCG 
CAGCCTGTTG 
TTACCTGGGC 
CGTTTGCAAA 
CGGCATCAAT 
CCGGCAGCGG 
AACGCCAATT 
AGAAGTCGCA 
TGTATATCAA 
GAATTGACAT 
TATCGGCGGC 
GGATNTTATA 
GANGGGTTAT 



ACAAGCAAAT 
TATCCAGCGA 
AAAGCGGATG 
GGANACCGCC 
AACTCGAATT 
TTGTCGGACG 
TAAGGCGTTA 
TCCATCATGA 
CAGATGGAAA 
CAATAACGCT 
TTTTCCCATT 
ATTCCCGATT 
CCNCCNGTTA 
GCAGCAATTA 



TTATATCNCC 
TGNCCCCAGT 
CGGACGAAAT 
GACATCGATA 
CCTCTATGGC 
AGCAATTGCC 
TTGGTCGATC 
GGCGGCGGAA 
AGAAATACCG 
TGGGGCGTTT 
GTATATCGGT 
TGGGCAAAGA 
CAGCAACCGC 
TTGA 



GCCTGACTCC 
GCCGGTAAAA 
GGTNAGCAGT 
CCGCTTTGAA 
GATGAAAACG 
GTTGCTGATG 
GGAACGGTCT 
GAGTTGGGGT 
GCTGCNNATT 
GCGATCCTTC 
TCAACCAAAT 
GGCATTTGTT 
GTGTAAAACT 



This encodes a protein having amino acid sequence <SEQ ID 614>: 

1 MESTXSLQAN LYXRLTPAGA FYAVSSDXPS AGKTLLHSLL KADADE^4VSS 

51 EKLLTWAXTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLXI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRXLYXXL QQPRVKLGRE XGLCSNY'' 

ORF143a and ORF143-1 show 97.1% identity in 207 aa overlap: 



MESTXSLQANLYXRLTPAGAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 



orfl43a.pep 
orfl43-l 
orf 143a. pep 
orfl43-l 
orf 143a. pep 
orfl43-l 

orfl43-l 

Homology with a predicted ORF from N.sonorrhoeae 

ORF143 shows 95.5% identity over a llOaa overlap with a predicted ORF (ORF143ng) from 
N. gonorrhoeae: 

MRTKWSAVRSCTWADTADI DTALNLLYRLQKLE FLYGDENGHS DGINLXDEQLPLLMEQL 6 0 
II I II I I I II I : I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MRTKWSAVRSCSRADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQL 60 



DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

DIDTALNLLYRLQKLE FLYGDENGHS DGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

NANFHHEAAEELGLLAAEVAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 
I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 

STKFILVIGGIPDLGKEAFVTLVRXLY 

STKFILVIGGIPDLGKEAFVTLVRILY 



orfl43.pep 
orf 143ng 

orf 143ng 

An ORF143ng nucleotide sequence <SEQ ID 615> was predicted to encode a protein having amino 
acid sequence <SEQ ID 616>: 



SGSGKALLVDRNGLYLANANFHHEAAEELGLLAAEVAQMEKKYRLLIKNN 
SGSGKALLVDRNGLYLANANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGV 
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1 MRTKWSAVRS CSRADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHESAEELG LLAAEVAQME 

101 KKYRLLIRNN LYINNNAWGV CDPSGQSELT FFPLYIGSTK FILVIAGI PD 

151 LSKGGICYFG KDFIPPLQQP RVKLGTGGIM RQLLISILED LNNTSTDIIA 

201 SAVISTDGLP MATMLPSHLN SDRVGAISAT LLALGSRSVQ ELACGELEQV 

251 MIKGKSGYIL LSQAGKDAVL VLVAKETG RL GLILLDAKRA ARHIA EAI* 



Further work revealed the following gonococcal DNA sequence <SEQ ID 617>: 



1 ATGGAATCAA CACTTTCACT ACAAGCGAAT TTATATCCCT GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCG CAGCCTGTTG AAAGCGGATG CGGACGAAGT GGTCAGCAGT 

151 GAGAAGCTGC TCGCGGCGGA CACCGCCGAC ATCGATACCG CTTTGAACCT 

201 GTTGTACCGT TTGCAAAAAC TCGAATTCCT CTATGGCGAT GAAAACGGTC 

251 ATTCAGACGG CATCAATTTG TCGGACGAGC AATTGCCGTT GCTGATGGAA 

301 CAATTGTCCG GCAGCGGTAA GGCATTATTG GTCGATCGGA ACGGTCTGTA 

351 TCTTGCCAAC GCCAATTTCC ATCATGAGTC GGCGGAAGAG TTGGGGTTGT 

401 TGGCGGCAGA AGTCGCACAG ATGGAAAAGA AATACCGGCT GCTGATTAGG 

451 AACAACCTGT ATATCAACAA TAACGCTTGG GGCGTTTGCG ATCCTTCCGG 

501 TCAGAGCGAA TTGACATTTT TCCCATTGTA TATCGGTTCA ACCAAATTTA 

551 TTTTGGTTAT CGCCGGCATT CCCGATTTGA GCAAAGAGGC ATTTGTTACT 

601 TTGGTAAGGA TTTTATACCG CCGTTACAGC AACCGCGTGT AA 

This corresponds to the amino acid sequence <SEQ ID 618; ORF143ng-l>: 



1 MESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 

51 EKLLAADTAD IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLLME 

101 QLSGSGKALL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ MEKKYRLLIR 

151 NNLYINNNAW GVCDPSGQSE LT FFPLYIGS TKFILVIAGI PDLSKEAFVT 

201 LVRILYRRYS NRV* 

ORF143ng-l and ORF143-1 show 95.8% identity in 214 aa overlap: 



orf 14 3ng-l .pep MESTLSLQANLYPCLTPAGAFYAVSSDAPSAGKTLLRSLLKADADEVVSSEKLLA-ADTA 59 
Orfl4 3-l MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 60 

orf 143ng-l .pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 119 
orf 14 3-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 12 0 

orfl4 3ng-l.pep NANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGVCDPSGQSELTFFPLYIG 17 9 
orf 14 3-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 180 

orfl43ng-l.pep STKFILVIAGIPDLSKEAFVTLVRILYRRYSNRV 213 
orfl43-l STKFILVIGGIPDLGKEAFVTLVRILYRRYSNRV 214 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and K gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 619>: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGr 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CA.GGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GaCGGGTCAA wTyCCAGCGT 

401 CCGTGGATG.. 
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This corresponds to the amino acid sequence <SEQ ID 620; ORF144>: 

1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 621>: 



1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

4 01 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

4 51 CTGTCTTTGG GCGTGGGCAT TTCCTTTATG GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAC CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCAGGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTTTGT CTGGAAACCG CGCGCTCCCT CTTCACTTGG TATATGGGCA 

7 01 ATTTCGACGG CTACCGCTCG ATTTACGGCG CGTTTGCCGC CGTGCCGTTT 

7 51 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGGCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CAAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGACACCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

12 01 CAGGCGAAAA AACGGCAGTA G 

This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>: 



1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLTFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS lYGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

4 01 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) from strain A of iV: 
meningitidis: 

10 20 30 40 50 60 

orf 14 4 .pep MTFLQRLQGLADNKICAFA WFWRRFDEERVPQXAASMTFTT LLALVPVLTVMVAVASI F 

orfl44a MTFLQRLQGLADNKICAFA WFWRRFDEERVPQAAASMTFTT LLALVPVLTVMVAVASI F 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 144 .pep PVFDRWSDSFV5FVNQTIVPXGADMVFDYINAFREQAHR LTAIGSVMLWTSLML IRTID 

orf 14 4a PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANR LTAIGSVMLWTSXML IRTID 
70 80 90 100 110 120 

130 

orf 144. pep NTFNRIWRVXXQRPWM 

orf 14 4a NTFNRIWRVNSQRPWMMQFLVYW ALLTFGPLSLGVGISFXV GSVQDAALASGAPQWSGAL 



wo 99/24578 



-349- 



PCT/IB98/01665 



The complete length ORF144a nucleotide sequence <SEQ ID 623> is: 



1001 
1051 
1101 
1151 
1201 



ATGACCTTTT 
GTTTGCATGG 
CGGCGGCAAG 
ACCGTGATGG 
GGATTCGTTC 
ACATGGTNTT 
ACGGCAATCG 
GACGATAGAC 
CGTGGATGAT 
CTGTCTTTGG 
CGCGCTTGCC 
CGACGCTGAN 
CCAAACCGCT 
AGCGTTCTGT 
ATTTCGACGG 
TTTCTGTTGT 
GCTGACTTCT 
TCGACTCGCG 
GATGCGGCGC 
GCATATCAAT 
CGCGGCACGG 
GGGGCGGATT 
TCCGTTGCCT 
TGATGCCGTG 
CAGGCGAAAA 



TACAACGTTT 
TTCGTCGTCC 
CATGACGTTT 
TGGCGGTCGC 
GTCTCCTTCG 
CGACTATATC 
GCAGCGTGAT 
AATACGTTCA 
GCAGTTTCTC 
GCGTGGGCAT 
TCAGGTGCGC 
CTTCATGACG 
TCGTTCCCGC 
CTGGAAACCG 
CTACCGCTCG 
GGCTGAACCT 
TCACTCTCCT 
CGGACGGTTT 
AAAAAGAAGG 
ATGGGCTACG 
CTACATCTAT 
CGATTGAGTT 
GTGGAAAGGG 
TTTGCAGACT 
AACAGCAGCA 



GCAAGGTTTG 
GCCGCTTTGA 
ACGACACTGC 
TTCGATTTTC 
TCAACCAAAC 
AATGCGTTCC 
GCTGGTCGTT 
ACCGCATCTG 
GTCTATTGGG 
TTCCTTTATN 
CGCAGTGGTC 
CTTTTGCTGT 
GCGGCANGCG 
CGCGTTCCCT 
ATTTACGGNG 
GTTGTGGACG 
ACTGGCAGGG 
GACGACGTGT 
CNAAGCCTTG 
ACGAGTTGGG 
TCCGGCAGAC 
GAACGAACTC 
ATCATGTGAA 
TTGAACATGA 
ATCTTGA 



GCAGACAATA 
TGAAGAACGC 
TGGCACTCGT 
CCCGTGTTCG 
CATTGTGCCG 
GCGAGCAGGC 
ACCTCGCNGA 
GCGGGTCAAT 
CTTTACTGAC 
GTCGGCTCGG 
GGGCGCGTTG 
GGGGGCTGTA 
TTTGTCGGGG 
CTTTACTTGG 
CGTTTGCCGC 
CTGGTCTTGG 
AGAAGCGTTC 
TGAAAATCCT 
CCTGTTCAGG 
CGAGCTTTTG 
AGGGTTGGGT 
TTCAAGCTCT 
CCAAGCTGTC 
CGCTGGCAGA 



AAATCTGTGC 
GTACCGCAGG 
CCCCGTGCTG 
ACCGNTGGTC 
CAGGGCGCGG 
GAACCGGCTG 
TGCTGATTCG 
TCCCAGCGTC 
GTTCGGGCCG 
TACAGGATGC 
CGAACGGCGG 
CCGCTNCGTG 
CTTTGGCAAC 
TATATGGGCA 
CGTGCCGTTT 
GCGGCGCGGT 
CGCAGGGNCT 
GCTGCTTCTG 
AGTTCAGACG 
GAAAAGCTGG 
GTTGAAAACG 
TCGTTTACCG 
GATGCGGTAA 
GTTTGACGCT 



This encodes a protein having amino acid sequence <SEQ ID 624>: 



1 MTFLQRLQGL ADNKICAFA W FVVRRFDEER VPQAAASMTF 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI 

101 TAIG3VMLW TSXMLI RTID NTFNRIWRVN SQRPWMMQFL 

151 LSLGVGI5FX V GSVQDAALA SGAPQWSGAL RTAATLXFMT 

201 PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS 

251 FLLWLNLLWT LVL GGAVLT3 SLSYWQGEAF RRXFDSRGRF 

301 DAAQKEGXAL PVQEFRRHIN MGYDELGELL EKLARHGYIY 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT 

4 01 QAKKQQQS* 

ORF144a and ORF144-1 show 97.8% identity in 406 aa overlap: 



TTLLALVPVL 
NAFREQANRL 
VYW ALLTFGP 
LLLWGLYRXV 
lYGAF AAVPF 
DDVLKILLLL 
SGRQGWVLKT 
LNMTLAEFDA 



ori;144; 
orfl44- 
orfl44i 
orfl44- 
orfl44; 
orfl44- 
orfl44a 
orfl44- 



orfl44a 
orfl44-: 



MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSXMLIRTID 

PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLVVT3LMLIRTID 

NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 

NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

RTAATLXFMTLLLWGLYRXVPNRFVPARXAFVGALATAFCLETARSLFTWYMGNFDGYRS 

RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

lYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRXFDSRGRFDDVLKILLLL 

lYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

DAAQKEGXALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

FKLFVYRPLPVERDHVNQAVDAVMMPCLQTLNMTLAEFDAQAKKQQQS 4 08 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I 
FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 4 0 6 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF144 shows 91.2% identity over a 136aa overlap with a predicted ORF (ORF144ng) from 
N.gonorrhoeae: 

orf 144 .pep MTFLQRLQGLADNKICAFAWFVVRRFDEERVPQXAASMTFTTLLALVPVLTVMVAVASIF 60 

Mill II I II I I II I I I I I : I I I : II I I I I I I I I I II II I I I I I I II I I I II I I I I 
orfl44ng MTFLQCWQGSADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 50 

orf 144 .pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 120 

I I I I I I I I I i I I I I I j I I I I I I I I I I I I : I I I : I I I I I I I II II II II II I II I II II 
orfl4 4ng PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLVVTSLMLIRTID 120 

orf 144. pep NTFNRIWRVXXQRPWM 136 

1:1111111 MINI 

orfl4 4ng NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 180 

The complete length ORF144ng nucleotide sequence <SEQ ID 625> is predicted to encode a 
protein having amino acid sequence <SEQ ID 626>: 

1 MTFLQCWQGS ADNKICAFAW FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYWALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS lYGAFAAVPF 

251 FLLWLHLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 

351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKQQQS* 

Further work revealed the fciUowing gonococcal DNA sequence <SEQ ID 627>: 



1 ATGACCTTTT TACAACGTTG GCAAGGTTTG GCGGACAATA AAATCTGTGC 

51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG 

101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 

151 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 

301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 

401 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 

451 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 

501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 

551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 

651 GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 

751 TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 

801 GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 

1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 

1151 TGAcgccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 

1201 CAGgcgAAAA AACAGCAGCA GTCTTGA 

This encodes a variant of ORF 1 44ng, having the amino acid sequence <SEQ ID 628; ORF 144ng- 1>: 

1 MTFLQRWQGL ADNKICAFA W FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS lYGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 
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351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
401 QAKKQQQS* 

ORF144ng-l and ORF 144-1 show 94.1% identity in 406 aa overlap: 

orf 14 4ng-l . pep MTFLQRWQGLADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orf 14 4-1 MTFLQRLQGLADNKICAFAWFVVRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orfl4 4ng-l.pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 
I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I : I I I : I I I I I I I 1 I I I ! I I I I I I I I I I I I I 
10 orf 144-1 PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 

orfl4 4ng-l.pep NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 

orfl44-l NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGI SFMVGSVQDAALASGAPQWSGAL 

15 

orfl44ng-l. pep KTAARLAFMTLLLWGLYRFVPNRFVPARQAFVGALITAFCLETARFLFTWYMGNFDGYRS 

orf 14 4-1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

20 orf 144ng-l .pep lYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

I I I I I I I I I ! I M I ! I I I I I I I I I I I I I I i I I I I I I I I I I I [ I I I I I I I I I I I I I ! I I I 
orf 14 4-1 lYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orfl4 4ng-l.pep DAAQKEGRTLSVQEFRRHINMGYDELGELLEKLARYGYIYSGRQGWVLKTGADSIELSEL 
25 I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I h I I ! I I I I I I I I I I I I I I I I ! I : i 

orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

orf 144ng-l . pep FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKQQQS 

I I I I I I I I I i I I !! I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I !: I 
30 orfl44-l FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 

On this basis of this analysis, including the identification of several putative transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



35 Example 75 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 629>: 



, .AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 
AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 
GCACCGATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 
ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACaCC TGCGCCAAAG 
CCTGCTTGAA ACACGGGAAC ACGGCTGA 



This corresponds to the amino acid sequence <SEQ ID 630; ORF146>: 



45 Further work revealed the complete nucleotide sequence <SEQ ID 63 1>: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

50 201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

4 01 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 

55 451 CTCATGCGCG CCATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 
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501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAGGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCATCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGTAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 632; ORF146-l>: 

1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF146 shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) from sfrain A of A^. 
meningitidis: 



orf 14 6 . pep RHARRIRIDTAINPELEALAEHLHYQWQGF 

orfl4 6a KLNGSEIRLLDRHFTLLQTDLQQTVALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 
280 290 300 310 320 330 

40 50 60 70 

orf 14 6 . pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 

orf 14 6a LWLSTNMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHSX 
340 350 360 370 

The complete length ORF 146a nucleotide sequence <SEQ ID 63 3> is: 



1 ATGAACACCT CGCAACGCAA 

51 CGAACGCTAC CGCTACCGCC 

101 CCGTCCTGTT CGCCACCGCC 

151 GAGTGGATAG GGATGACCGT 

201 AGGGGCGATT TACTCCAAGG 

251 GGCTGGGCGC GGGTTTGGGC 

301 GGCAACCTCC TCTTCTACCT 

351 CTGGGCGGCG GTCGGCAAAA 

401 CGATGTGCAT GCTCATCGGC 

451 CTGATGCGCG CGATGAACGT 

501 CGCCAAACTG CTGCCGCTGA 

551 CCGACAACCT GACCGACTGC 

601 AGGCGCATGA CCCGCGAACG 

651 AATCAACGCA CGCATGGTCA 

701 GCGAAAGCCG CATCAGCCCC 

751 CGTAAAATTG TCAACACCAC 

801 GCAATCTCCC AAACTCAACG 

851 TCACACTGCT CCAAACCGAC 

901 AGACACGCCC GCCGCATCCG 

951 AGCCCTCGCC GAACACCTCC 

1001 GCACCAATAT GCGTCAGGAA 

1051 ACCCGCCGCA AATGGCTGGA 

1101 CCTGCTTGAA ACACGGGAAC 



CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 
GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 
TCCGCCCGGC TGCTCCACCT CCAACACGGC 
CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 
CGGTGGAACG TATGCTCGGC ACGGTCATCG 
GTTTTATGGC TGAACCAGCA TTATTTCCAC 
CACCGTCGGC ACGGCAAGCG CACTGGCCGG 
ACGGCTACGT CCCTATGCTG GCGGGGCTGA 
GACAACGGCA GCGAATGGTT CGACAGCGGC 
CCTCATCGGC GCGGCCATCG CCATCGCCGC 
AATCCACACT GATGTGGCGT TTCATGCTTG 
AGCAAAATGA TTGCCGAAAT CAGCAACGGC 
CCTCGAAGAG AACATGGCGA AAATGCGCCA 
AAAGCCGCAG CCACCTCGCC GCCACATCGG 
GCCATGATGG AAGCCATGCA GCACGCCCAC 
CGAGCTGCTC CTGACCACCG CCGCCAAGCT 
GCAGCGAAAT CCGGCTGCTT GACCGCCACT 
CTGCAACAAA CCGTCGCCCT TATCAACGGC 
CATCGACACC GCCATCAACC CCGAACTGGA 
ACTACCAATG GCAGGGCTTC CTCTGGCTCA 
ATTTCCGCCC TCGTCATCCT GCTGCAACGC 
TGCCCACGAA CGCCAACACC TGCGCCAAAG 
ACAGTTGA 
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This encodes a protein having amino acid sequence <SEQ ID 634>: 

1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFVV LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWFDSG 

5 151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLTDC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHS* 

10 ORF146a and ORF146-1 show 99.5% identity in 374 aa overlap: 



)rfl46a 
)rfl46-: 
)rfl46a 
)rfl46- 
)rfl46a 
)rfl46-: 
)rfl46a 
>rfl46-: 
)rfl46a 
)rfl46- 
)rfl46i 



)rfl46c 
)rfl46- 



MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 
LGMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 



pep 



pep 



VGKNGYVPMLAGLTMCMLIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

FMLADNLTDCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

RQHLRQSLLETREHSX 

RQHLRQSLLETREHGX 

Homology with a predicted ORF from N.sonorrhoeae 

ORF146 shows 97.3% identity over a 75aa overlap with a predicted ORF (ORF146ng) from 
N. gonorrhoeae: 

orfl4 6.pep RHARRIRIDTAINPELEALAEHLHYQWQGF 30 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl4 6ng KLNGSEIRLLDRHFTLLQTDLQQTAALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 3 64 

orfl4 6.pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHG 7 5 

orfl46ng LWLSTNMRQEISALVIPLQRTRRICWLDAHERQHLRQSLLETREHG 409 

An ORF146ng nucleotide sequence <SEQ ID 635> was predicted to encode a protein having amino 
50 acid sequence <SEQ ID 636>: 

1 MSGVRFPSPA PIPSTDPPSG SLCFFTFPLQ TASDWWSSQR KRLSGRWLNS 

51 YERYRHRRLI HAVRLGGTVL FATALARLLH LQHGEW IGMT VFWLGMLQF 

101 QGAIYSNAVE R MLGTVIGLG AGLGVLWL NQ HYFHGNLLFY LTIGTASALA 

151 GWAAVGKNGY VPMLAGLTMC MLIGDNGSEW LDSGLMRAMN VLIGAAIAIA 

55 201 AAKLLPLKST LMWRFMLADN LADCSKMIAE ISNGRRMTRE RLEQNMVKMR 

251 QINARMVKSR SHLAATSGES RISPSMMEAM QHAHRKIVNT TELLLTTAAK 

301 LQSPKLNGSE IRLLDRHFTL LQTDLQQTAA LINGRHARRI RIDTAINPEL 

351 EALAEHLHYQ WQGFLWLSTN MRQEISALVI PLQRTRRKWL DAHERQHLRQ 

401 SLLETREHG* 

60 Further work revealed the following gonococcal DNA sequence <SEQ ID 63 7>: 



45 
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1 ATGAACTCCT CGCAA.CGCAA ACGCCTTTCC GgccGCTGGC TCAACTCCTA 

51 CGAACGCTac cGCCaccGCC GCCTCATACA TGCCGTGCGG CTCGGCggaa 

101 ccgtCCTGTT CGCCACCGCA CTCGCCCGgc tACTCCACCT CCAacacggc 

151 gAATGGATAG GGAtgaCCGT CTTCGTCGTC CTCGGCATGC TCCAGTTCCA 

201 AGGCgcgatt tActccaacg cggtgGAacg taTGctcggt acggtcatcg 

251 ggctgGGCGC GGGTTTGGgc gTTTTATGGC TGAACCAGCA TTAtttccac 

301 ggcaacCTcc tcttctacct gaccatcggc acggcaagcg cactggocgg 

351 ctGGGCGGCG GTCGGCAAAA acggctacgt occtatgctg GCGGGGotgA 

401 CGATGTGCAT gctcatcggc gACAACGGCA GCGAATGGCT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT CCTCATCGGC GCCGCCATCG CCATTGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGTATGA CGCGCGAACG TTTGGAGCAG AATATGGTCA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC TCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGCAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTC GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGCCGCCCT CATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 638; ORF146ng-l>: 



1 MNSSQRKRLS GRWLNSYERY RHRRLIHAVR LGGTVLFATA LARLLHLQHG 

51 EW IGMTVFVV LGMLQFQGA I YSNAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTIG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEQ NMVKMRQINA RMVKSRSHLA ATSGESRISP SMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTAALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

ORF146ng-l and ORF146-1 show 96.5% identity in 375 aa overlap 



MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

MNSSQRKRLSGRWLNSYERYRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFW 

LGMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

LGMLQFQGAIYSNAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTIGTASALAGWAA 

VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

FMLADNLADCSKMIAEISNGRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISP 

AMMEAMQHAHRKIWTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

SMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTAALING 

RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

RQHLRQSLLETREHGX 
I I I I I I I I I I I I I I I I 
RQHLRQSLLETREHGX 

Furthermore, ORF146ng-l shows homology with a hypothetical E.coli protein: 

sp|P33011|YEEA_ECOLI HYPOTHETICAL 40.0 KD PROTEIN IN COBU-SBMC INTERGENIC REGION 
>gi|1736674 |gnl|PID|dl016553 (D90838) ORF_ID:o348#20; similar to [SwissProt 
Accession Nijmber P33011] [Escherichia coli] >gi 1 1736682 | gnl t PID | dl016560 (D90839) 
ORF_ID:o34B#20; similar to [SwissProt Accession Number P33011] [Escherichia coli] 



orfl46-l. 


■pep 






orfl46-l. 


■pep 


orfl46ng- 


-1 


orfl46-l. 


.pep 


orf 146ng- 




orfl46-l. 


■pep 


orfl46ng- 


-1 


orfl46-l, 


■pep 


orfl46ng- 


-1 


orfl46-l. 


■pep 


orfl46ng- 


-1 


orfl46-l 


■pep 


orf 14 6ng- 


-1 
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>gi I 1788318 (ftE000292) f352; 100% identical to fragment YEEA_ECOLI SW: P33011 but 
has 203 additional C-terminal residues [Escherichia coli] Length = 352 
Score = 109 bits (271), Expect = 2e-23 

Identities = 89/347 (25%), Positives = 150/347 (42%), Gaps = 21/347 (6%) 

Query: 20 YRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFWLGMLQFQGAIYSNAVERML 79 

YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 
Sbjct: 15 YRHYRIVHGTRVALAFLLTFLIIRLFTIPESTWPLVTMWIMGPISFWGNVVPRAFERIG 74 

Query: 80 GTVIGLGAGLGVLWLNQHYFHGNLLFYLTIGTASALAGWAAVGKNGYVPMLAGLTMCMLI 139 

GTV+G GL L L L + A L GW A+GK Y +L G+T+ ++ + 

Sbjct: 75 GTVLGSILGLIALQLE LISLPLMLVWCAAAMFLCGWLALGKKPYQGLLIGVTLAIW 131 

Query: 140 GDNGSEWLDSGLMRAMNVLIGXXXXXXXXKLLPLKSTLMWRFMLADNLADCSKMIAEISN 199 

G E +D+ L R+ +V++G + P ++ + WR LA +L + +++ + 

Sbjct: 132 GSPTGE-IDTALWRSGDVILGSLLAMLFTGIWPQRAFIHWRIQLAKSLTEYNRVYQSAFS 190 

Query: 200 GRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISPSMMEAMQHAHRKIVNXXXX 259 

+ R RLE ++ K+ VK R +A S E+RI S+ E +Q +R +V 

Sbjct: 191 PNLLERPRLESHLQKLL TDAVKMRGLIAPASKETRIPKSIYEGIQTINRNLVCMLEL 24 7 

Query: 260 XXXXXXXXQSPK LNGSEIRLLDRHFXXXXXXXXXXAALINGRHARRIRIDTAINPEL 316 

+ LN ++R D AL G +N + 

Sbjct: 248 QINAYWATRPSHFVLLNAQKLR--DTQHMMQQILLSLVHALYEGNPQPVFANTEKLNDAV 305 

Query: 317 EALAEHL— HYQWQ GFLWLSTNMRQEISALVILLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 

Sbjct: 306 EELRQLLNNHHDLKWETPIYGYVWLNMETAHQLELLSNLICRALRK 352 

On the basis of this analysis, including the identification of several transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 76 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 639> 

1 . . GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 
51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 
101 AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 
151 GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 
201 GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC.GCGGTGA 
251 TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 
301 GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 
351 GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 
4 01 GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 
451 CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 
501 TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 
551 AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 
601 TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 
651 CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 
701 CTTTGTACGA T. . 

This corresponds to the amino acid sequence <SEQ ID 640; ORF147>: 

1 . .AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMWAQVSD 
51 AGTPAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 
101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 
151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 
201 SE3AQNIMKI LTAELPTKQA AELAAKITGE GKKALYD.. 

Further work revealed the complete nucleotide sequence <SEQ ID 641>: 

1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 
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2 01 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

4 01 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

7 01 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

7 51 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

8 01 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 
851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 642; ORF147-l>: 



1 MFQKHLQKAS DSVVGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVEG SDFYimGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical protein ORF286 of E.coli (accession number U18997) 
ORF147 and E.coli ORF286 protein show 36% aa identity in 237aa overlap: 



Orfl47: 1 AEDTRVTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDG^4WAQVSDAGTPAVCDPG 60 

AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG 
Orf286: 43 AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPG 102 

Orfl47: 61 AKLARRVREXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 120 

L R RE F + GF+P KS RR 

Orf286: 103 YHLVRTCREAGIRWPLPGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAE 162 



Orfl47: 121 AFPIVMFETPHRIGAALADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALSADGD 179 

++ +E+ HR+ +L D+ + E R ++LARE+TKT+ET VGE+ + D + 

Orf286: 163 PRTLIFYESTHRLLDSLEDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDEN 222 

Orfl47: 180 QSRGEMVLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALY 236 

+ +GEMVL++ + E L A + +L AELP K+AA LAA+I G K ALY 

Orf286: 223 RRKGEMVLIV-EGHKAQEEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALY 278 

Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF147 shows 96.6% identity over a 237aa overlap with ORF75a from strain A oiN. meningitidis: 



orfl47 .pep 
orf75a 



AE DTRVTAQLL SAYGI QGKLVS VREHNERQ 
TLYWATPIGNLADITLRALAVLQECADIICAEDTRVTAQLLSAYGIQGKLVSVREHNERQ 



orf 147 .pep 
orf75a 



MADKI VGYLS DGMWAQVS DAGT PAVCDPGAKLARRVREAGFK WPWGAXAVMAALSVA 
MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREVGF KWPWGASAVMAALSVA 



orf 147 .pep 
orf75a 



GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 
GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIGATLADMAELFPERRLM 



orf 147 .pep 



160 170 180 190 200 210 

LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
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orf 147 .pep 
orf75a 



LTAELPTKQAAELAAKITGEGKKALYD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 



ORF147a is identical to ORF75a, which includes aa 56-292 of ORF75. 
Homology with a predicted ORF from N.sonorrhoeae 

ORF147 shows 94.1% identity over a 237aa overlap with a predicted ORF (ORF147ng) from A^. 
gonorrhoeae: 

orf 147. pep AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 30 

I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 
orfl47ng TLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGRLVSVREHNERQ 85 

orf 147. pep MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGAXAVMAALSVA 90 

arfl47ng MADKVIGFLSDGLVVAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGASAVMAALSVA 145 

orf 147 .pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 150 

orfl47ng GVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATLADMAELFPERRLM 205 

orf 147. pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 210 

orfl47ng LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNAMKI 2 65 

orf 147. pep LTAELPTKQAAELAAKITGEGKKALYD 237 

orfl47ng LAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 

An ORF147ng nucleotide sequence <SEQ ID 643> was predicted to encode a protein having amino 
acid sequence <SEQ ID 644>: 

1 MSVFQTAFFM FQKHLQKASD SVVGGTLYVV ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLVV 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFK VV PVVGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRftAFPVV MFETPHRIGA TLADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 



Further work revealed the following gonococcal DNA sequence <SEQ ID 645>: 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACTG 
CAGTGTGCGC 
TCCTTTCAGA 
GCCGTGTGCG 
GTTCAAAGTC 
GTGTGGCCGG 
CCGAAATCGG 
ATTTCCTGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAATGCGA 
GGAGCTTGCC 
TGGCACTGTC 



AACACTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CGCAGCTTTT 
GAACACAACG 
CGGCCTGGTT 
ACCCGGGCGC 
GTTCCCGTCG 
TGTGGCGGAA 
GCGAACGTAG 
GTCATGTTTG 
GGAATTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAGATTA 
GTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AGCGGCAGAT 
GTGGCGCAGG 
GAAACTCGCC 
TGGGCGCAAG 
TCCGATTTTT 
GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
TGCGGCCGAG 
CAGGTGAGGG 
AAATGA 



GACAGCGTCG 
TTTGGCAGAC 
TCATTTGTGC 
GGCATTCAGG 
GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTAATG 
ATTTCAACGG 
GCCAAATGGG 
CCGAATCGGG 
GTCTGATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
CAAAAAGGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAGGTTGGT 
GTAATCGGTT 
GGGTACGCCG 
GCGAAGCAGG 
GCGGCGTTGA 
TTTTGTACCG 
TGCGGGCGGC 
GCAACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCTGCG 
AGCAGGCGGC 
TTGTACGATT 
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This corresponds to the amino acid sequence <SEQ ID 646; ORF147ng-l>: 

1 MFQKHLQKAS DSVVGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPVVGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF147ng shows homology to a hypothetical E.coli protein: 

sp|P45528|YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi I 606086 (U18997) ORF_f286 [Escherichia coli] 

>gi 1 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 
[Escherichia coli] Length = 286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

Query: 4 KHLQKASDSWGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 

K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
Sbjct: 2 KQHQSADNSQ— GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

Query: 64 GRLVSVREHNERQMADKVIGFLSDGLVVAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
Sbjct: 60 ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 

Query: 124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPVVMFETPHRIGATL 183 
G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 17 9 



Sbjc 
Sbjct: 



I ADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

) EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 238 



Query: 243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 28 6 

EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 239 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKEQJALYKYAL 282 

Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 77 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 647> 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGTCGC ATCCGCTTCT C.GCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGcAT TGGTGGGCGt ATCAATATAT TGTGAGCGTG GCACATAACG 

351 GCGGCTATAA CAACGTTGAT TTTGGTGCGG AAGGAAk.AA tATCCC. GAT 

401 CAACAwCGww TTACTTATAA AATTGTGAAA CGGAATAATT ATAAAGCAGG 

451 GACTAAAGGC CATCCTTATG GCGGCGATTA TCATATGCCG CGTTTGCATA 

501 AATwTGTCAC AGATGCAGAA CCTGTTGAAA TGACCAGTTA TATGGATGGG 

551 CGGAAATATA TCGATCAAAA TAATTACCCT GACCGTGTTC GTATTGGGGC 

601 AGGCAGGCAA TATTGGCGAT CTGATGAAGA TGAGCCCAAT AACCGCGAAA 

651 GTTCATATCA TATTGCAAGT 

701 GGCTC ACCAATGTTT ATCTATGATG CCCAAAAGCA 

751 AAAGTGGTTA ATTAATGGGG TATTGCAAAC GGGCAACCCC TATATAGGAA 

801 AAAGCAATGG CTTCCAGCTG GTTCGTAAAG ATTGGTTCTA TGATGAAATC 

851 TTTGCTGGAG ATACCCATTC AGTATTCTAC GAACCACGTC AAAATGGGAA 

901 ATACTCTTTT AACGACGATA ATAATGGCAC AGGAAAAATC AATGCCAAAC 
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951 ATGAACACAA TTCTCTGCCT AATAGATTAA AAACACGAAC CGTTCAATTG 

1001 TTTAATGTTT CTTTATCCGA GACAGCAAGA GAACCTGTTT ATCATGCTGC 

1051 AGGTGGTGTC AACAGTTATC GACCCAGACT GAATAATGGA GAAAATATTT 

1101 CCTTTATTGA CGAAGGAAAA GGCGAATTGA TACTTACCAG CAACATCAAT 

1151 CAAGGTGCTG GAGGATTATA TTTCCAAGGA GATTTTACGG TCTCGCCTGA 

1201 AAATAACGAA ACTTGGCAAG GCGCGGGCGT TCATATCAGT GAAGACAGTA 

1251 CCGTTACTTG GAAAGTAAAC GGCGTGGCAA ACGACCGCCT GTCCAAAATC 

1301 GGCAAAGGCA CGCTG 

// 

2101 GATAAAG 

2151 TGACTGCTTC ATTGACTAAG ACCGACATCA GCGGCAATGT CGATCTTGCC 

2201 GATCACGCTC ATTTAAATCT CACAGGGCTT GCCACACTCA ACGGCAATCT 

2251 TAGTGCTiAAT GGCGATACAC GTTATACAGT CAGCCACAAC GCCACCCAAA 

2301 ACGGCAACCk TAgCCtCGtG G.sAATGcCC AAGCAACATT TAATCAAGCC 

2351 ACATTAAACG GCAACACATC GGCTTCgGGC AATGCTTCAT TTAATCTAAG 

2401 CGACCACGCC GTACAAAACG GCAGTCTGAC GCTTTCCGGC AACGCTAAGG 

2451 CAAACGTAAG CCATTCCGCA CTCAACGGTA ATGTCTCCCT AGCCGATAAG 

2501 GCAGTATTCC ATTTTGAAAG CAGCCGCTTT ACCGGACAAA TCAGCGGCGG 

2551 CAagGATACG GCATTACACT TAAAAGACAG CGAATGGACG CTGCCGTCAg 

2601 GarCGGAATT AGGCAATTTA AACCTTGACA ACGCCACCAT TACaCTCAAT 

2 651 TCCGCCTATC GCCACGATGC GGCAGGGGCG CAAACCGGCA GTGCGACAGA 
2701 TGCGCCGCGC CGCCGTTCGC GCCGTTCGCG CCGTTCCCTA TTATmCGTTA 
2751 CACCGCCAAC TTCGGTAGAA TCCCGTTTCA ACACGCTGAC GGTAAACGGC 
2801 AAATTGAACG GTCAGGGAAC ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA 
2851 CCGCAGCGAC AAATTGAAGC TGGCGGAAAG TTCCGAAGGC ACTTACACCT 
2901 TGGCGGTCAA CAATACCGGC AACGAACCTG CAAGCCTCGA ACAATTGACG 
2951 GTAGTGGAAG GAAAAGACAA CAAACCGCTG TCCGAAAACC TTAATTTCAC 
3001 CCTGCAAAAC GAACACGTCG ATGCAGGCGC GTGG 

// 

3551 TTAGAC CGCGTATTTG CCGAAGACCG 

3601 CCGCAACGCC GTTTGGACAA GCGGCATCCG GGACACCAAA CACTACCGTT 

3651 CGCAAGATTT CCGCGCCTAC CGCCAACAAA CCGACCTGCG CCAAATCGGT 

3701 ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC GGCATCCTGT TTTCGCACAA 

3751 CCGGACCGAA AACACCTTCG ACGACGGCAT CGGCAACTCG GCACGGCTTG 

3801 CCCACGGCGC CGTTTTCGGG CAATACGGCA TCGACAGGTT CTACATCGGC 

3851 ATCAGnCGCG GGCGCGGGTT TTAGCAGCGG CAGCCTTTcA GACGGCATCG 

3 901 GAGsmAAAwT CCGCCGCCGC GTGCtGCATT ACGGCATTCA GGCACGAtAC 
3951 CGCGCCGgtt tCggCGgATt CGGCATCGAA CCGCACATCG GCGCAACGCg 
4001 ctATTTCGTC CAAAAAGCGG ATTACCGCTA CGAAAACGTC AATATCGCCA 
4051 CCCCCGGCCT TGCATTCAAC CGcTACCGCG CGGGCATTAa GGCAGATTAT 
4101 TCATTCAAAC CGGCGCAACA CATTTCCATC ACGCCTTATT TGAGCCTGTC 
4151 CTATACCGAT GCCGCTTCGG GCAAAGTCCG AACACGCGTC AATACCGCCG 
4201 TATTGGCTCA GGATTTCGGC AAAACCCGCA GTGCGGAATG GGgCGTAAAC 
4251 GCCGAAATCA AAGGTTTCAC GCTGTCCCTC CACGCTGCCG CCGCCAAAGG 
4301 CCCGCAACTG GAAGCGCAAC ACAGCGCGGG CATCAAATTA GGCTACCGCT 
4351 GGTAA... 

This corresponds to the amino acid sequence <SEQ ID 648; 0RF1>: 



1 MKTTDKRTTE THRKAPKTGR IRFXAAYLAI CLSFGILPQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD lEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGVQYI VSVAHNGGYN NVDFGAEGXN IXDQXRXTYK IVKRNNYKAG 

151 TKGHPYGGDY HMPRLHKXVT DAEPVEMTSY MDGRKYIDQN NYPDRVRIGA 

201 GRQYWRSDED EPNNRESSYH IAS GS PMFIYDAQKQ 

251 KWLINGVLQT GNPYIGKSNG FQLVRKDWFY DEIFAGDTHS VFYEPRQNGK 

301 YSFNDDNNGT GKINAKHEHN SLPNRLKTRT VQLFNVSLSE TAREPVYHAA 

351 GGVNSYRPRL NNGENISFID EGKGELILTS NINQGAGGLY FQGDFTVSPE 

401 NNETWQGAGV HISEDSTVTW KVNGVANDRL SKIGKGTL 

// 

701 DKVTAS LTKTDISGNV DLADHAHLNL TGLATLNGNL 

751 SANGDTRYTV SHNATQNGNX SLVXNAQATF NQATLNGNTS ASGNASFNLS 

801 DHAVQNGSLT LSGNAKANVS HSALNGNVSL ADKAVFHFES SRFTGQISGG 

851 KDTALHLKDS EWTLPSGXEL GNLNLDNATI TLNSAYRHDA AGAQTGSATD 

901 APRRRSRRSR RSLLXVTPPT SVESRFNTLT VNGKLNGQGT FRFMSELFGY 

951 RSDKLKLAES SEGTYTLAVN NTGNEPASLE QLTWEGKDN KPLSENLNFT 

1001 LQNEHVDAGA W 

// 

1151 LDRVFAEDR 

1201 RNAVWTSGIR DTKHYRSQDF RAYRQQTDLR QIGMQKNLGS GRVGILFSHN 

1251 RTENTFDDGI GNSARLAHGA VFGQYGIDRF YIGISAGAGF SSGSLSDGIG 

1301 XKXRRRVLHY GIQARYRAGF GGFGIEPHIG ATRYFVQBCAD YRYENVNIAT 

1351 PGLAFNRYRA GIKADYSFKP AQHISITPYL SLSYTDAASG KVRTRVNTAV 
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14 01 LAQDFGKTRS AEWGVNAEIK GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 

1451 * 

Further sequencing analysis revealed the complete nucleotide sequence <SEQ ID 649>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGAAAT CCCGATCAAC 

401 ATCGTTTTAC TTATAAAATT GTGAAACGGA ATAATTATAA AGCAGGGACT 

451 AAAGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCATAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAAATGAC CAGTTATATG GATGGGCGGA 

551 AATATATCGA TCAAAATAAT TACCCTGACC GTGTTCGTAT TGGGGCAGGC 

601 AGGCAATATT GGCGATCTGA TGAAGATGAG CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGTGCGT ATTCTTGGCT CGTTGGTGGC AATACCTTTG 

701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG TGAAAAAATT 

751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACGGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTGGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACGTCAAAA TGGGAAATAC TCTTTTAACG 

1001 ACGATAATAA TGGCACAGGA AAAATCAATG CCAAACATGA ACACAATTCT 

1051 CTGCCTAATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGTGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACGAA 

1201 GGAAAAGGCG AATTGATACT TACCAGCAAC ATCAATCAAG GTGCTGGAGG 

1251 ATTATATTTC CAAGGAGATT TTACGGTCTC GCCTGAAAAT AACGAAACTT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGAAG ACAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

1401 GCACGTTCAA GCCAAAGGGG AAAACCAAGG CTCGATCAGC GTGGGCGACG 

1451 GTACAGTCAT TTTGGATCAG CAGGCAGACG ATAAAGGCAA AAAACAAGCC 

1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGTACGGTGC AACTGAATGC 

1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

1601 GTTTGGATTT AAACGGGCAT TCGCTTTCGT TCCACCGTAT TCAAAATACC 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

1701 TACCATTACA GGCAATAAAG ATATTGCTAC AACCGGCAAT AACAACAGCT 

1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 ACGACCAAAA CGAACGGGCG GCTCAACCTT GTTTACCAGC CCGCCGCAGA 

1851 AGACCGCACC CTGCTGCTTT CCGGCGGAAC AAATTTAAAC GGCAACATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCAAC ACCGCACGCC 

1951 TACAATCATT TAAACGACCA TTGGTCGCAA AAAGAGGGCA TTCCTCGCGG 

2001 GGAAATCGTG TGGGACAACG ACTGGATCAA CCGCACATTT AAAGCGGAAA 

2051 ACTTCCAAAT TAAAGGCGGA CAGGCGGTGG TTTCCCGCAA TGTTGCCAAA 

2101 GTGAAAGGCG ATTGGCATTT GAGCAATCAC GCCCAAGCAG TTTTTGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAATTG TGTCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 

2251 TTGACTAAGA CCGACATCAG CGGCAATGTC GATCTTGCCG ATCACGCTCA 

2301 TTTAAATCTC ACAGGGCTTG CCACACTCAA CGGCAATCTT AGTGCAAATG 

2351 GCGATACACG TTATACAGTC AGCCACAACG CCACCCAAAA CGGCAACCTT 

2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

2451 CAACACATCG GCTTCGGGCA ATGCTTCATT TAATCTAAGC GACCACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGGCA ACGCTAAGGC AAACGTAAGC 

2551 CATTCCGCAC TCAACGGTAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

2601 TTTTGAAAGC AGCCGCTTTA CCGGACAAAT CAGCGGCGGC AAGGATACGG 

2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCAGG CACGGAATTA 

2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

2751 CCACGATGCG GCAGGGGCGC AAACCGGCAG TGCGACAGAT GCGCCGCGCC 

2801 GCCGTTCGCG CCGTTCGCGC CGTTCCCTAT TATCCGTTAC ACCGCCAACT 

2851 TCGGTAGAAT CCCGTTTCAA CACGCTGACG GTAAACGGCA AATTGAACGG 

2901 TCAGGGAACA TTCCGCTTTA TGTCGGAACT CTTCGGCTAC CGCAGCGACA 

2 951 AATTGAAGCT GGCGGAAAGT TCCGAAGGCA CTTACACCTT GGCGGTCAAC 

3001 AATACCGGCA ACGAACCTGC AAGCCTCGAA CAATTGACGG TAGTGGAAGG 

3051 AAAAGACAAC AAACCGCTGT CCGAAAACCT TAATTTCACC CTGCAAAACG 

3101 AACACGTCGA TGCCGGCGCG TGGCGTTACC AACTCATCCG CAAAGACGGC 

3151 GAGTTCCGCC TGCATAATCC GGTCAAAGAA CAAGAGCTTT CCGACAAACT 

3201 CGGCAAGGCA GAAGCCAAAA AACAGGCGGA AAAAGACAAC GCGCAAAGCC 

3251 TTGACGCGCT GATTGCGGCC GGGCGCGATG CCGTCGAAAA GACAGAAAGC 

3301 GTTGCCGAAC CGGCCCGGCA GGCAGGCGGG GAAAATGTCG GCATTATGCA 
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3351 GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC GGATAAAGAC ACCGCCTTGG 

3401 CGAAACAGCG CGAAGCGGAA ACCCGGCCGG CTACCACCGC CTTCCCCCGC 

3451 GCCCGCCGCG CCCGCCGGGA TTTGCCGCAA CTGCAACCCC AACCGCAGCC 

3501 CCAACCGCAG CGCGACCTGA TCAGCCGTTA TGCCAATAGC GGTTTGAGTG 

3551 AATTTTCCGC CACGCTCAAC AGCGTTTTCG CCGTACAGGA CGAATTAGAC 

3 601 CGCGTATTTG CCGAAGACCG CCGCAACGCC GTTTGGACAA GCGGCATCCG 

3 651 GGACACCAAA CACTACCGTT CGCAAGATTT CCGCGCCTAC CGCCAACAAA 

3701 CCGACCTGCG CCAAATCGGT ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC 

3751 GGCATCCTGT TTTCGCACAA CCGGACCGAA AACACCTTCG ACGACGGCAT 

3801 CGGCAACTCG GCACGGCTTG CCCACGGCGC CGTTTTCGGG CAATACGGCA 

3 851 TCGACAGGTT CTACATCGGC ATCAGCGCGG GCGCGGGTTT TAGCAGCGGC 

3 901 AGCCTTTCAG ACGGCATCGG AGGCAAAATC CGCCGCCGCG TGCTGCATTA 

3 951 CGGCATTCAG GCACGATACC GCGCGGGTTT CGGCGGATTC GGCATCGAAC 
4001 CGCACATCGG CGCAACGCGC TATTTCGTCC AAAAAGCGGA TTACCGCTAC 
4051 GAAAACGTCA ATATCGCCAC CCCCGGCCTT GCATTCAACC GCTACCGCGC 
4101 GGGCATTAAG GCAGATTATT CATTCAAACC GGCGCAACAC ATTTCCATCA 
4151 CGCCTTATTT GAGCCTGTCC TATACCGATG CCGCTTCGGG CAAAGTCCGA 
4201 ACACGCGTCA ATACCGCCGT ATTGGCTCAG GATTTCGGCA AAACCCGCAG 

4 251 TGCGGAATGG GGCGTAAACG CCGAAATCAA AGGTTTCACG CTGTCCCTCC 
4 301 ACGCTGCCGC CGCCAAAGGC CCGCAACTGG AAGCGCAACA CAGCGCGGGC 
4351 ATCAAATTAG GCTACCGCTG GTAA 

This corresponds to the amino acid sequence <SEQ ID 650; 0RF1-1>: 



1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD lEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGRN PDQHRFTYKI VKRNNYKAGT 

151 KGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGRKYIDQNN YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPYGFLPT GGSFGDSGSP MFIYDAQKQK WLINGVLQTG NPYIGKSNGF 

301 QLVRKDWFYD EIFAGDTHSV FYEPRQNGKY SFNDDNNGTG KINAKHEHNS 

351 LPNRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDE 

401 GKGELILTSN INQGAGGLYF QGDFTVSPEN NETWQGAGVH ISEDSTVTWK 

451 VNGVANDRLS KIGKGTLHVQ AKGENQGSIS VGDGTVILDQ QADDKGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDIATTGN NNSLDSKKEI AYNGWFGEKD 

601 TTKTNGRLNL VYQPAAEDRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLNDHWSQ KEGIPRGEIV WDNDWINRTF KAENFQIKGG QAWSRNVAK 

701 VKGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTNCVEK TITDDKVIAS 

751 LTKTDISGNV DLADHAHLNL TGLATLNGNL SANGDTRYTV SHNATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASGNASFNLS DHAVQNGSLT LSGNAKANVS 

851 HSALNGNVSL ADKAVFHFES SRFTGQISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSATD APRRRSRRSR RSLLSVTPPT 

951 SVESRFNTLT VNGKLNGQGT FRFMSELFGY RSDKLKLAES SEGTYTLAW 

1001 NTGNEPASLE QLTVVEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG 

1051 EFRLHNPVKE QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAVEKTES 

1101 V7\EPARQAGG ENVGIMQAEE EKKRVQADKD TALAKQREAE TRPATTAFPR 

1151 ARRARRDLPQ LQPQPQPQPQ RDLISRYANS GLSEFSATLN SVFAVQDELD 

1201 RVFAEDRRNA VWTSGIRDTK HYRSQDFRAY RQQTDLRQIG MQKNLGSGRV 

1251 GILFSHNRTE NTFDDGIGNS ARLAHGAVFG QYGIDRFYIG ISAGAGFSSG 

1301 SLSDGIGGKI RRRVLHYGIQ ARYRAGFGGF GIEPHIGATR YFVQKADYRY 

1351 ENVNIATPGL AFNRYRAGIK ADYSFKPAQH ISITPYLSLS YTDAASGKVR 

1401 TRVNTAVLAQ DFGKTRSAEW GVNAEIKGFT LSLHAAAAKG PQLEAQHSAG 

1451 IKLGYRW* 

Computer analysis of these sequences gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A') 

ORFl shows 57.8% identity over a 1456aa overlap with an ORF (ORFla) from sttain A of A^. 
meningitidis: 



10 20 30 40 50 60 

or f 1 . pep MKTTDKRTTETHRKAPKTGR IRFXAAYLAICLSFGIL PQAWAGHTYFGINYQYYRDFAEN 

o r f 1 a MKTT DKRTTETHRKAPKTGR IRFSPAYLAICLSFGIL PQAWAGHTYFG IN YQYYRDFAEN 

10 20 30 40 50 60 



70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 
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KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSVVSRNGVAALVGDQYIVSVAHNGGYN 



orfla 



NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 
NVDFGAEGXN-PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSD 



MDGRKYIDQNNYPDRVRIGAGRQYWRSDEDEP NN 

II I I : : : I I : I II I I : I : : III I : I : II 
MRGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 



orf 1 .pep 
orfla 



RESSYH lA SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRK 

I : : : : II I I I I I I I I I :: I I I : I I I I I II I I I : I I I I I : I I 

SGDVRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENGFQLIRK 



orfl.pep 
orfla 



DWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTVQLFNV 
DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 



orf 1 .pep 



SLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYFQGDFT 
SLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 



VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTL 

VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSISVGDGT 



orfla 



TNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSKMEG 



IPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGVAPHQSH 



XXXXXDKVTASLTKTDISGNVDLADHAHLNLTGLATLNGNLSAN 

TICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLXGNLSAN 



490 500 510 520 530 540 

GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLTLSG 
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GDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNGSLTLSD 



NAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 
NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 



NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 
NLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS— LLSVTPPTSVESRFNTLTVNG 



KLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDNKPL 



SENLNFTLQNEHVDAGAW-- 



VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 



orf 1 . pep 
orfla 



orf 1 . pep RYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHI 
1 I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



orf 1 .pep 
orfla 



70 The complete length ORF 1 a nucleotide sequence <SEQ ID 65 1 > is: 
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1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCT TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTNT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGNAAT CCCGATCAGC 

401 ACCGTTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA GCCTGACAAT 

4 51 TCACACCCTT ACAACGGCGA TTANCATATG CCGCGTTTGC ATAAATTTGT 

501 CACAGATGCA GAACCTGTCG AAATGACGAG TGACATGAGG GGGAATACCT 

551 ATTCCGATAA AGAAAAATAT CCCGAGCGTG TCCGCATCGG CTCAGGACAC 

601 CACTATTGGC GTTATGATGA TGACAAACAC GGCGATTTAT CCTACTCCGG 

651 CGCATGGTTA ATTGGCGGCA ATACACATAT GCAGGGTTGG GGAAATAATG 

701 GCGTANTTAG TTTGAGCGGC GATGTGCGCC ATGCCAACGA CTATGGCCCT 

751 ATGCCGATTG CAGGTGCGGC AGGCGACAGC GGTTCGCCAA TGTTTATTTA 

801 TGACAAAACA AACAATAAAT GGCTGCTCAA CGGAGTTTTA CAAACCGGCT 

851 ACCCTTATTC CGGCAGGGAA AACGGTTTCC AGCTGATACG CAAAGATTGG 

901 TTCTACGATG ACATTTACAG AGGCGATACA CATACCGTCT NTTTTGAACC 

951 GCGCAGTAAC GGACATTTTT CCTTTACATC CAACAACAAC GGTACGGGTA 

1001 CGGTAACAGA AACCAACGAA AAGGTNTCCA ATCCAAAGCT TAAAGTACAG 

1051 ACAGTCCGAC TGTTTGACGA ATCTTTGAAT GAAACTGATA AAGAACCAGT 

1101 TTACGCGGCA GGGGGTGTTA ATCAGTACCG TCCAAGGTTA AACAACGGTG 

1151 AAAACCTTTC TTTTATCGAT TACGGCAACG GCAAACTCAT CTTATCAAAC 

1201 AACATCAACC AAGGCGCGGG CGGTTTGTAT TTTGAAGGTG ATTTTACGGT 

1251 CTCGCCTGAA AACAACGAAA CGTGGCAAGG CGCGGGCGTT CATATCAGTG 

1301 AAGACAGTAC CGTTACTTGG AAAGTAAACG GCGTGGCAAA CGACCGCCTG 

1351 TCCAAAATCG GCAAAGGCAC GCTGCACGTT CAAGCCAAAG GGGAAAACCA 

14 01 AGGCTCGATC AGCGTGGGCG ACGGTACAGT CATTTTGGAT CAGCAGGCAG 

1451 ACGATAAAGG CAAAAAACAA GCCTTTAGTG AAATCGGCTT GNTCAGCGGC 

1501 AGGGGTACGG TGCAACTGAA TGCCGATAAT CAGTTCAACC CCGACAAACT 

1551 CTATTTCGGC TTTCGCGGCG GACGTTTGGA TTTAAACGGG CATTCGCTTT 

1601 CGTTCCACCG TATTCAAAAT ACCGATGAAG GGGCGATGAT TGNCNATCAT 

1551 AATGCCACAA CAACATCCAC CGTTACCATT ACAGGGAATG AAAGTATTAC 

17 01 ACAACCGAGT GGTAAGAATA TCAATAGACT TAATTACAGC AAAGAAATTG 
1751 CCTACAACGG TTGGTTTGGC GAGAAAGATA CGACCAAAAC GAACGGGCGG 

18 01 CTCAACCTTG TTTACCAGCC CGCCGCAGAA GACCGCACCC NGCTGCTTTC 
1851 CGGCGGAACA AATTTAAACG GCAACATCAC GCAAACAAAC GGCAAACTGT 
1901 TTTTCAGCGG CAGACCGACA CCGCACGCCT ACAATCATTT AGGAAGCGGG 
1951 TGGTCAAAAA TGGAAGGTAT CCCACAAGGA GAAATCGTGT GGGACAACGA 
2 001 CTGGATCNAC CGCACGTTTA AAGCGGAAAA TTTCCATATT CAGGGCGGGC 
2051 AGGCGGTGAT TTCCCGCAAT GTTGCCAAAG TGGAAGGCGA TTGNCATTTG 
2101 AGCAATCACG CCCAAGCAGT TTTTGGTGTC GCACCGCATC AAAGCCATAC 
2151 AATCTGTACA CGTTCGGACT GGACNGGTCT GACAAATTGT GTCGAANAAA 
2201 NCATTACCGA CGATAAAGTG ATTGCTTCAT TGACTAAGAC NGACNTNAGC 
2251 GGCANTGTNA GNCTNNCCNA TNACGNTNNT TNAAANCTCN CNGGGCNTGC 
2 301 NNCACTNAAN GGCAATCTTA GTGCAAATGG CGATACACGT TATACAGTCA 
2351 GCCACAACGC CACCCAAAAC GGCAACCTTA GCCTCGTGGG CAATGCCCAA 
2401 GCAACATTTA ATCAAGCCAC ATTAAACGGC AACNCATCGG NTTCGGGCAA 
2451 TGCTTCATTT AATCTAAGCA ACAACGCCGC ACAAAACGGC AGTCTGACGC 
2501 TTTCCGACAA CGCTAAGGCA AACGTAAGCC ATTCCGCACT CAACGGCAAT 
2551 GTCTCCCTAG CCGATAAGGC AGTATTCCAT TTTGAAAACA GCCGCTTTAC 
2 601 CGGACAACTC AGCGGCAGCA AGGANACAGC ATTACACTTA AAAGACAGCG 
2 651 AATGGACGCT GCCGTCAGGC ACGGAATTAG GCAATTTAAA CCTTGACAAC 
2701 GCCACCATTA CACTCAATTC CGCCTATCGC CACGATGCTG CAGGCGCGCA 
2751 AACCGGCAGN GTGTCAGACA CGCCGCGCCG CCGTTCGCGC CGTTCCCTAT 
2 8 01 TATCCGTTAC ACCGCCAACT TCGGTAGAAT CCCGTTTCAA CACGCTGACG 
2 851 GTAAACGGCA AATTGAACNG TCAAGGAACA TTCCGCTTTA TGTCGGAACT 
2 901 CTTCGGCTAC CGAAGCGACA AATTGAAGCT GGCGGAAAGT TCCGAAGGNA 
2951 CTTACACCTT GGCGGTCAAC AATACCGGCA ACGAACCCGT AAGCCTCGAT 
3001 CAATTGACGG TAGTGGAAGG GAAAGACAAC AAACCGCTGT CCGAAAACCT 
3051 TAATTTCACC CTGCAAAACG AACACGTCGA TGCCGGCGCG TGGCGTTACC 
3101 AACTCATCCG CAAAGACGGC GAGTTCCGCC TGCATAATCC GGTCAAAGAA 
3151 CAAGAGCTTT CCGACAAACT CGGCAAGGCA GAAGCCAAAA AACAGGCGGA 
3201 AAAAGACAAC GCGCAAAGCC TTGACGCGCT GATTGCGGCC GGGCGCGATG 
3251 CCGCCGAAAA GACAGAAAGC GTTGCCGAAC CGGCCCGGCN GGCAGGCGGG 
3301 GAAAATGTCG GCATTATGCA GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC 
3351 GGATAAAGAC AGCGCNTTGG CGAAACAGCG CGAAGCGGAA ACCCGGCCGG 
3401 NTACCACCGC CTTCCCCCGC GCCCGCNGCG CCCGCCGGGA TTTGCCGCAA 
3451 CCGCAGCCCC AACCGCAACC TCAACCCCAA CCGCAGCGCG ACCTGATNAG 
3501 CCGTTATGCC AATAGCGGTT TGAGTGAATT TTCCGCCACG CTCAACAGCG 
3551 TTTTCGCCGT ACAGGACGAA TTGGACCGCG TGTTTGCCGA AGACCGCCGC 
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3601 AACGCNGTTT GGACAAGCNG CATCCGGNAC ACCAAACACT ACCGTTCGCA 

3651 AGATTTCCGC GCCTACCGCC AACAAACCGA CCTGCGCCAA ATCGGTATGC 

3701 AGAAAAACCT CGGCAGCGGG CGCGTCGGCA TCCTGTTTTC GCACAACCGG 

3751 ACCGAAAACA NCTTCGACGA CGGCATCGGC AACTCGGCAC GGCTTGCCCA 

3801 CGGCGCCGTT TTCGGGCAAT ACGGCATCGG CAGGTTCGAC ATCGGCATCA 

3851 GCACGGGCGC GGGTTTTAGC AGCGGCANTC TNTCAGACGG CATCGGAGGC 

3901 AAAATCCGCC GCCGCGTGCT GCATTACGGC ATTCAGGCAC GATACCGCGC 

3951 CGGTTTCGGC GGATTCGGCA TCGAACCGTA CATCGGCGCA ACGCGCTATT 

4001 TCGTCCAAAA AGCGGATTAC CGCTACGAAA ACGTCAATAT CGCCACCCCC 

4051 GGTCTTGCGT TCAACCGNTA CCGNGCGGGC ATTAAGGCAG ATTATTCATT 

4101 CAAACCGGCG CAACACATNT CCATCACNCC TTATTTNAGC CTGTCCTATA 

4151 CCGATGCCGC TTCGGGCAAA GTCCGAACAC GCGTCAATAC CGCNGTATTG 

4201 GCTCAGGATT TCGGCAAAAC CCGCAGTGCG GAATGGGGCG TAAACGCCGA 

4251 AATCAAAGGT TTCACGCTGT CCNTCCACGC TGCCGCCGCC AAAGGNCCGC 

4301 AACTGGAAGC GCAACACAGC GCGGGCATCA AATTAGGCTA CCGCTGGTAA 

This encodes a protein having amino acid sequence <SEQ ED 652>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD lEVYNKKGEL VGKSMTKAPM IDFSVVSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGXN PDQHRFSYQI VKRNNYKPDN 

151 SHPYNGDXHM PRLHKFVTDA EPVEMTSDMR GNTYSDKEKY PERVRIGSGH 

201 HYWRYDDDKH GDLSYSGAWL IGGNTHMQGW GNNGVXSLSG DVRHANDYGP 

251 MPIAGAAGDS GSPMFIYDKT NNKWLLNGVL QTGYPYSGRE NGFQLIRKDW 

301 FYDDIYRGDT HTVXFEPRSN GHFSFTSNNN GTGTVTETNE KVSNPKLKVQ 

351 TVRLFDESLN ETDKEPVYAA GGVNQYRPRL NNGENLSFID YGNGKLILSN 

4 01 NINQGAGGLY FEGDFTVSPE NNETWQGAGV HISEDSTVTW KVNGVANDRL 

451 SKIGKGTLHV QAKGENQGSI SVGDGTVILD QQADDKGKKQ AFSEIGLXSG 

501 RGTVQLNADN QFNPDKLYFG FRGGRLDLNG HSLSFHRIQN TDEGAMIXXH 

551 NATTTSTVTI TGNESITQPS GKNINRLNYS KEIAYNGWFG EKDTTKTNGR 

601 LNLVYQPAAE DRTXLLSGGT NLNGNITQTN GKLFFSGRPT PHAYNHLGSG 

651 WSKMEGIPQG EIVWDNDWIX RTFKAENFHI QGGQAVISRN VAKVEGDXHL 

701 SNHAQAVFGV APHQSHTICT RSDWTGLTNC VEXXITDDKV lASLTKTDXS 

751 GXVXLXXXXX XXLXGXAXLX GNLSANGDTR YTVSHNATQN GNLSLVGNAQ 

801 ATFNQATLNG NXSXSGNASF NLSNNAAQNG SLTLSDNAKA NVSHSALNGN 

851 VSLADKAVFH FENSRFTGQL SGSKXTALHL KDSEWTLPSG TELGNLNLDN 

901 ATITLNSAYR HDAAGAQTGX VSDTPRRRSR RSLLSVTPPT SVESRFNTLT 

951 VNGKLNXQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN NTGNEPVSLD 

1001 QLTWEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG EFRLHNPVKE 

1051 QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAAEKTES VAEPARXAGG 

1101 ENVGIMQAEE EKKRVQADKD SALAKQREAE TRPXTTAFPR ARXARRDLPQ 

1151 PQPQPQPQPQ PQRDLXSRYA NSGLSEFSAT LNSVFAVQDE LDRVFAEDRR 

1201 NAVWTSXIRX TKHYRSQDFR AYRQQTDLRQ IGMQKNLGSG RVGILFSHNR 

1251 TENXFDDGIG NSARLAHGAV FGQYGIGRFD IGISTGAGFS SGXLSDGIGG 

1301 KIRRRVLHYG IQARYRAGFG GFGIEPYIGA TRYFVQKADY RYENVNIATP 

1351 GLAFNRYRAG IKADYSFKPA QHXSITPYXS LSYTDAASGK VRTRVNTAVL 

1401 AQDFGKTRSA EWGVNAEIKG FTLSXHAAAA KGPQLEAQHS AGIKLGYRW* 

A transmembrane region is underlined. 

ORFl-1 shows 86.3% identity over a 1462aa overlap with ORFla: 



MKTTDKRTTETHRKAPKTGRIRFS PAYLAI CLS FGILPQAWAGHTYFG IN YQYYRDFAEN 

I I j I I I I I I I I I I I ! I I I I I I I I ! I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 



KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I 

KGKFftVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 



NVDFGAEGXNPDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSDM 
NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
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orf la.pep 
orfl-1 



RGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDL--SYSGA WLIGGNTHMQGWGNN 

DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 



GVXSLSGD-VRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENG 
I : : I : : : : : I : II : I : I : I II I I I I I I I I :: I II : II II I I I II I: II 
GTVNLGSEKIKHS-PYGFLPTGGSFGDSGSPMFIYDAQKQECWLINGVLQTGNPYIGKSNG 



orfla.pep 
orfl-1 



FQLIRKDWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQT 
FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 



orfla.pep 
orfl-1 



VRLFDESLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLY 
VQLFNVSLSETAREPVYHAAGGTOSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 



orfla.pep 
orfl-1 



FEGDFTVSPENNETWQGAGVHI3EDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 
FQGDFTVSPENNETWQGAGVHI3ED3TVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 



orfla.pep 
orfl-1 



SVGDGTVILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
SVGDGTVILDQQADDKGKKQAFSEIGLV3GRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 



irfla.pep 
.rfl-1 



HSLSFHRIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFG 
HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIAT-TGNN-NSLDSKKEIAYNGWFG 



orfla.pep 
orfl-1 



EKDTTKTNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 
EKDTTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDH 



irfla.pep 
>rfl-l 



WSKMEGIPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGV 
WSQKEGIPRGEIVWDNDWINRTFKAENFQIKGGQAWSRNVAKVKGDWHLSNHAQAVFGV 



orfla.pep 
orfl-1 



APHQSHTICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLX 
APHQSHTICTRSDWTGLTNCVEKTITDDKVIA3LTKTDISGNVDLADHAHLNLTGLATLN 



orfla.pep 
orfl-1 



GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNG 
GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNG 



orfla.pep 
orfl-1 



SLTLSDNAKANVSHSAINGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSG 
SLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSG 
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orf la.pep 
orfl-1 



TELGNLNLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFN 

TELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFN 



orf la.pep 

orfl-1 



orf la.pe 
40 orfl-1 



orf la.pep 
orfl-1 



orfla 
orfl-: 



1130 1140 1150 1160 1170 1180 

EAETRPXTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAV 

EAETRPATTAFPRARRARRDLPQLQPQPQPQP--QRDLISRyANSGLSEFSATLNSVFAV 
1140 1150 1160 1170 1180 1190 



Homology with adhesion and penetration protein hap precursor of H.influenzae (accession number P45387') 
Amino acids 23-423 of ORFl show 59% aa identity with hap protein in 450aa overlap: 

orfl 23 FXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAENKGKFAVGAKDIEVYNKKGELVG 82 

F +L C+S GI QAWAGHTYFGI+YQYYRDFAENKGKF VGAK+IEVYNK+G+LVG 
hap 6 FRLNFLTACVSLGIASQAWAGHTYFGIDYQYYRDFAENKGKFTVGAKNIEVYNKEGQLVG 65 

orfl 83 KSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYNNVDFGAEGXNIXDQXRXTYKIV 142 

SMTKAPMIDFSWSRNGVAALVG QYIVSVAHNGGYN+VDFGAEG N DQ R TY+IV 
hap 66 TSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYNDVDFGAEGRN-PDQHRFTYQIV 124 
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orfl 


143 




125 


orf 1 


203 




185 


orfl 


223 




245 








305 


orfl 


335 




364 


orfl 


394 




424 


10 acids 715- 


Orfl 


41 




733 


orfl 


99 




793 


orfl 


159 




853 


orfl 


219 




900 


orfl 


279 




960 



- HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 



SGSPMFIYDA+K++WLIN VLQTG+P+ G+ NGFQL+R++WFY+E+ A DT SVF 



A G N Y+PR+ G+NI D+GKG L + +NINQGAGGLYF+G+F V 



GV I +D+TV WKV+ NDRLSKIG GTL 



25 Amino acids 715-101 1 of ORFl show 50% aa identity with hap protein in 258aa overlap: 



L L+N+T+TLNSAY + S+ +AP L T PTS E RFNTLTVN 



GKL+GQGTF+F S LFGY+SDKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 



LS+ L FTL+N+HVDAGA 
LSDKLKFTLENDHVDAGA 977 

Amino acids 1 192-1450 of ORFl show 41% aa identity with hap protein in 259aa overlap: 

Orfl 1 LDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNR 60 

LDR+F + ++AVWT+ +D + Y S FRAY+Q+T+LRQIG+QK L +GR+G +FSH+R 
hap 1135 LDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQKTNLRQIGVQKALANGRIGAVFSHSR 1194 

orfl 61 TENTFDDGIGNSARLAHGAVFGQYGIDRFYXXXXXXXXXXXXXXXXXIGXJQCRRRVLHYG 120 

++NTFD+ +NAL+FQY KR+ ++YG 

hap 1195 SDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISASKMAEEQSRKIHRKAINYG 1254 

orfl 121 IQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPA 18 0 

+ A Y+ G GI + P+ G RYF+++ +Y+ E V + TP LAFNRY AGI+ DY+F P 
hap 1255 VNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSLAFNRYNAGIRVDYTFTPT 1314 

orfl 181 QHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAA 240 

+IS+ PY ++Y D ++ V+T VN VL Q FG+ E G+ AEI F +S + + 
hap 1315 DNISVKPYFFVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEVGLKAEILHFQISAFISKS 137 4 

orfl 241 KGPQLEAQHSAGIKLGYRW 259 

+G QL Q + G+KLGYRW 
hap 1375 QGSQLGKQQNVGVKLGYRW 1393 
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Homologv with a predicted ORF from N.gonorrhoeae 

The blocks of ORFl show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 259 aa overlap, 
respectively with a predicted ORF (ORFlng) from N.gonorrhoeae: 



orfl.pep 

orf 1 . pep 
orf 1 . pep 
orfl.pep 
orfl .pep 

orf 1 . pep 

orfl . pep 

orfl . pep 

orfl . pep 
orflng 
orfl . pep 

orfl.pep 

orf 1 .pep 

orfl . pep 

orf 1 .pep 



MKTTDKRTTETHRKAPKTGRIRFXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 

! I i I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I i I I I I I 1 I I : I I I I I I I I I I I I I I 
KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 

NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 

NVDFGAEGSN-PDQHRFSYQIVKRNKYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSY 

MDGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIAS 



Mill 



I I I 



MDGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSG 
GSPMFIYDA QKQKWLIN GVLOTGNPYIGKSNG 

I I I I I I I I I I I I I I I I II I I I I I I I II M I I I 

GGTVNLGSEKIKHSPY GFLPTGGSFGDSGSPMFIYDA QKQKWLIM GVLOTGNPYXGKSNG 

FOLVRKDWFYDEIFAGDTHSVFYEPRONGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 

FOLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRT 

VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 
I I I I I i I II I I I I I I I I I I I II I I I II II I I II II I II I : I I I I I I I I I I II I I I II I I 
VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 

FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGT 

FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 
// 

DKVTASLTKTDISGNVDLADHAHLNLTGLA 
III II I : II I : I II : I I I I ! II I I I I II 
FGVAPHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDVRGNVSLADHAHLNLTGLA 

TLNGNLSANGDTR-YTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHA 

TFNGNL-VQAETRTIRLRANATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNA 

VQNGSLTLSGNAKANVSHSALNGNVSLADfCAVFHFESSRFTGQISGGKDTALHLKDSEWT 

VQNGSLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWT 

LPSGXELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVE 

LPSGTELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRS LLSVTPPTSAE 

SRFNTLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLT 

SRFNTLTVNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLT 

WEGKDNKPLSENLNFTLQNEHVDAGAW 

WEGKDNTPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGET 
// 

LDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 
PQRDLISRYANSGLSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 
AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFY 
AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFD 



1010 
1011 
1070 
1211 
1239 
1271 
1299 
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orfl.pep IGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1331 

orflng IGISAGAGFSSGSLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 135 9 

orf 1 .pep RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1391 

orflng RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1419 

orfl.pep AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 1440 

orflng AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 14 68 

The complete length ORFlng nucleotide sequence was identified <SEQ ID 653>: 



1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCTAA 

51 AACCGGCCGC ATCCGCTTCT CGCCCGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT GCCCCAAGCC CGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CGATGACGAA AGCCCCGATG ATTGATTTTT CTGTGGTATC GCGTAACGGC 

301 GTGGCGGCAT TGGCGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AATGTTGATT TTGGTGCGGA GGGAAGCAAT CCCGATCAGC 

4 01 ACCGCTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA AGCAGGGACT 

451 AACGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCACAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAGATGAC CAGTTATATG GATGGGTGGA 

551 AATACGCTGA TTTAAATAAA TACCCTGATC GTGTTCGAAT CGGAGCAGGC 

601 AGACAATATT GGCGGTCTGA TGAAGACGAA CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGCGCAT ATTCTTGGCT CGTCGGTGGC AATACCTTTG 

701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG CGAAAAAATT 

751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACAGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTAGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACATCAAAA TGGGAAATAC TTTTTTAACG 

1001 ACAATAATAA TGGCGCAGGA AAAATCGATG CCAAACATAA ACACTATTCT 

1051 CTACCTTATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGGGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACAAA 

1201 GGAAAAGGTG AATTGATACT TACCAGCAAC ATCAACCAAG GCGCGGGCGG 

1251 TTTGTATTTT GAGGGTAATT TTACGGTCTC GCCTAAAAAC AACGAAACGT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGATG GCAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

1401 GCTGGTTCAA GCCAAAGGGG AAAACCAAGG CTCGGTCAGC GTGGGCGACG 

1451 GTAAAGTCAT CTTAGATCAG CAGGCGGACG ATCAAGGCAA AAAACAAGCC 

1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGGACGGTGC AACTGAATGC 

1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

1601 GTTTGGATTT GAACGGGCAT TCGCTTTCGT TCCACCGCAT TCAAAATACC 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

1701 TACCATTACA GGCAATAAAG ATATTACTAC AACCGGCAAT AACAACAACT 

1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 GCAACCAAftA CGAACGGGCG GCTCAATCTG AATTACCAAC CGGAAGAAGC 

1851 GGATCGCACT TTACTGCTTT CCGGCGGAAC AAATTTAAAC GGCAATATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCGAC ACCGCACGCC 

1951 TACAATCATT TAGGAAGCGG GTGGTCAAAA ATGGAAGGTA TCCCACAAGG 

2 001 AGAAATCGTG TGGGACAACG ATTGGATCGA CCGCACATTT AAAGCGGAAA 

2051 ACTTCCATAT TCAGGGCGGA CAAGCGGTGG TTTCCCGCAA TGTTGCCAAA 

2101 GTGGAAGGCG ATTGGCATTT AAGCAATCAC GCCCAAGCAG TTTTCGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAGTTG TACCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 

2251 TTGAGCAAGA CCGACATCAG AGGCAATGTC AGCCTTGCCG ATCACGCTCA 

2301 TTTAAATCTC ACAGGACTTG CCACACTCAA CGGCAATCTT AGTGCAGGCG 

2351 GAGACACGCA CTATACGGTT ACGCGCAACG CCACCCAAAA CGGCAACCTC 

2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

2451 CAACACATCG GCTTCGGACA ATGCTTCATT TAATCTAAGC AACAACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGACA ACGCTAAGGC AAACGTAAGC 

2551 CATTCCGCAC TCAACGGCAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

2601 TTTTGAAAAC AGCCGCTTTA CCGGAAAAAT CAGCGGCGGC AAGGATACGG 

2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCGGG CACGGAATTA 

2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

2751 ACACGATGCG GCAGGCGCGC AAACCGGCAG TGCGGCAGAT GCGCCGCGCC 

2801 GCCGTTCGCG CCGTTCCCTA TTATCCGTTA CGCCGCCAAC TTCGGCAGAA 

2851 TCCCGTTTCA ACACGCTGAC GGTAAACGGC AAATTGAACG GTCAGGGAAC 
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2 901 ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA CCGCAGCGGC AAATTGAAGC 

2 951 TGGCGGAAAG TTCCGAAGGC ACTTACACCT TGGCTGTCAA CAATACCGGC 
3001 AACGAACCCG TAAGTCTCGA GCAATTGACG GTAGTGGAAG GAAAAGACAA 
3051 CACACCGCTG TCCGAAAATC TTAATTTCAC CCTGCaaaAc gaacacgtcg 
3101 atgccggcgc atggCGTTAT CAGCTTATCC gcaaagacgG CGAGTTCCgc 
3151 CTGCATAATC CGGTCAAAGA ACAAGAGCTT TCCGACAAAC TCGGCAAGgc 
3201 gggagaaACA GAggccgccT TGACGGCAAA ACAGGCacaA CTTGCCGCCA 
3251 AAcaacaggc ggaaaAAGAC AACgcgcaaa gccttgAcgc gctgattgcg 
3301 gCcgggcgca atgccaccga AAAGGCAgaa agtgttgccg aaccgGCCCG 
3351 GCAGGCAGGC GGGGAAAAtg ccgGCATTAT GCAGGCGGAG GAAGAGAAAA 
3401 AACGGGTGCA GGCGGATAAA GACACCGCCT TGGCGAAACA GCGCGAAGCG 
3451 GAAACCCGGC CGGCTACCAC CGCCTTCCCC CGCGCCCGCC GCGCCCGCCG 
3501 GGATTTGCCG CAACCGCAGC CCCAACCGCA ACCCCAACCG CAGCGCGACC 
3551 TGATCAGCCG TTATGCCAAT AGCGGTTTGA GTGAATTTTC CGCCACGCTC 
3601 AACAGCGTTT TCGCCGTACA GGACGAATTG GACCGCGTGT TTGCCGAAGA 
3651 CCGCCGCAAC GCCGTTTGGA CAAGCGGCAT CCGGGACACC AAACACTACC 
3701 GTTCGCAAGA TTTCCGCGCC TACCGCCAAC AAACCGACCT GCGCCAAATC 
3751 GGTATGCAGA AAAACCTCGG CAGCGGGCGC GTCGGCATCC TGTTTTCGCA 
3801 CAACCGGACC GGAAACACCT TCGACGACGG CATCGGCAAC TCGGCACGGC 
3851 TTGCCCACGG TGCCGTTTTC GGGCAATACG GCATCGGCAG GTTCGACATC 
3901 GGCATCAGCG CGGGCGCGGG TTTTAGTAGC GGCAGCCTTT CAGACGGCAT 

3 951 CAGAGGCAAA ATCCGCCGCC GCGTGCTGCA TTACGGCATT CAGGCAAGAT 

4 001 ACCGCGCAGG TTTCGGCGGA TTCGGCATCG AACCGCACAT CGGCGCAACG 
4 051 CGCTATTTCG TCCAAAAAGC GGATTACCGA TACGAAAACG TCAATATCGC 
4101 CACCCCGGGC CTTGCATTCA ACCGCTACCG CGCGGGCATT AAGGCAGATT 
4151 ATTCATTCAA ACCGGCGCAA CACATTTCCA TCACGCCTTA TTTGAGCCTG 
4201 TCCTATACCG ATGCCGCTTC CGGCAAAGTC CGAACGCGCG TCAATACCGC 
4251 CGTATTGGCG CAGGATTTCG GCAAAACCCG CAGTGCGGAA TGGGGCGTAA 
4301 ACGCCGAAAT CAAAGGTTTC ACGCTGTCCC TCCACGCTGC CGCCGCCAAG 
4351 GGGCCGCAAT TGGAAGCGCA GCACAGCGCG GGCATCAAAT TAGGCTACCG 
4 401 CTGGTAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 654>: 

1 MKTTDKRTTE THRPCAPKTGR IRFSPAYLAI CLSFGILPQA RAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD lEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALAGDQYI VSVAHNGGYN NVDFGAEGSN PDQHRFSYQI VKRNNYKAGT 

151 NGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGWKYADLNK YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPY GFLPT GGSFGDSGSP MFIYDAQ KQK WLIN GVLOTG NPYIGKSNGF 

301 QLVRKDWFYD EIFAGDTHSV FYEPHQNGKY FFNDNNNGAG KIDAKHKHYS 

351 LPYRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDK 

401 GKGELILTSN INQGAGGLYF EGNFTVSPPCN NETWQGAGVH ISDGSTVTWK 

451 VNGVANDRLS KIGKGTLLVQ AKGENQGSVS VGDGKVILDQ QADDQGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDITTTGN NNNLD3KKEI AYNGWFGEKD 

601 ATKTNGGLNL NYPPEEADRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLGSGWSK MEGIPQGEIV WDNDWIDRTF KAENFHIQGG QAW3RNVAK 

701 VEGDWHLSNH AQAVFGVAPH QSHTICTR3D WTGLTSCTEK TITDDKVIAS 

751 LSKTDVRGNV SLADHAHLNL TGLATFNGNL VQAETRTIRL RANATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASDNASFNLS NNAVQNGSLT LSDNAKANVS 

851 HSALNGNVSL ADKAVFHFEN SRFTGKISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSAAD APRRRSRRSL LSVTPPTSAE 

951 SRFNTLTVNG KLNGQGTFRF MSELFGYRSG KLKLAESSEG TYTLAVNNTG 

1001 NEPVSLEQLT WEGKDNTPL SENLNFTLQN EHVDAGAWRY QLIRKDGEFR 

1051 LHNPVKEQEL SDKLGKAGET EAALTAKQAQ LAAKQQAEKD NAQSLDALIA 

1101 AGRNATEKAE SVAEPARQAG GENAGIMQAE EEKKRVQADK DTALAKQREA 

1151 ETRPATTAFP RARRARRDLP QPQPQPQPQP QRDLISRYAN SGLSEFSATL 

1201 NSVFAVQDEL DRVFAEDRRN AVWTSGIRDT KHYRSQDFRA YRQQTDLRQI 

1251 GMQKNLGSGR VGILFSHNRT GNTFDDGIGN SARLAHGAVF GQYGIGRFDI 

1301 GISAGAGFSS GSLSDGIRGK IRRRVLHYGI QARYRAGFGG FGIEPHIGAT 

1351 RYFVQKADYR YENVNIATPG LAFNRYRAGI KADYSFKPAQ HISITPYLSL 

1401 SYTDAASGKV RTRVNTAVLA QDFGKTRSAE WGVNAEIKGF TLSLHAAAAK 

1451 GPQLEAQHSA GIKLGYRW* 

Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP -binding site motif A (P-loop). 



65 ORFl-1 and ORFlng show 93.7% identity in 1471 aa overlap: 
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MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 



KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I 
KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSVVSRNGVAALAGDQYIVSVAHNGGYN 



NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 



DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 



GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
GTVNLGSEKIKHSPYGFLPTGGSFGDSG3PMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 



QLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTV 
QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 



QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 
QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLYF 



QGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSIS 
: I : I I I I I : I I I I I I I I I I I I I : I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I M I : I 
EGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSVS 



VGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 

1111 I I I I I I j I I : I ! I I I I I I 1 I I i I I I I ! I I I I I I I I I I ! I I I I I I I I I I I I I I I I I 
VGDGKVI LDQQADDQGKKQAFSE IGLVSGRGT VQLNADNQFNPDKL Y FG FRGGRLDLNGH 



SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIATTGNNNSLDSKKEIAYNGWFGEKD 
I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I 1 I I : I I I I I I : I I I I I I I I I I I I I I I I I 
SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITTTGNNNNLDSKKEIAYNGWFGEKD 



TTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDHWSQ 
ATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAY 



KEGIPRGEIVWDNDWINRTFKAENFQIKGGQAWSRNVAKVKGDWHLSNHAQAVFGVAPH 
MEGIPQGEIVWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGVAPH 
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orfl-l.pep 
orflng-1 



QSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLNGNL 
QSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLNGNL 



orf 1-1 .pep 



SANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLT 
SAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNGSLT 



orflng-1 



LSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGTEL 
LSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSGTEL 



orfl-l.pep 
orflng-1 



GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 



GNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSR-- 



-R3LLSVTPPTSAESRFNTLT 



.rfl-l.pep 
)rflng-l 



1030 1040 1050 1060 1070 

>rfl-l.pep KPL3ENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
)rflng-l TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
1020 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 

)rfl-l.pep EAKKQAEKDNAQSLDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQ 

)rflng-l QAQLAAKQQAEKDNAQSLDALIAAGRNATEKAESVAEPARQAGGENAGIMQAEEEKKRVQ 
1080 1090 1100 1110 1120 1130 



orfl-l.pei 
orflng-1 



orf 1-1 . pel 
orflng-1 



orfl-l.pep 
orflng-1 



orf 1-1 .pel 
orflng-1 



orfl-l.pep 
orflng-1 
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In addition, ORFlng shows 55.7% identity with hap protein (P45387) over a 1455aa overlap: 

SCORES Initl; 1104 Initn: 4632 Opt: 2680 



orf lng-1 .pep 
p45387 



MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 
MKKTVFRLNFLTACISLGIVSQAWAGHTYFGIDYQYYRDFAEN 



orf lng-1 .pep 
p45387 



KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSVVSRNGVAALAGDQYIVSVAHNGGYN 
KGKFTVGAQNIKVYNKQGQLVGTSMTKAPMIDFSVVSRNGVAALVENQYIVSVAHNVGYT 



orf lng-1 .pep 
p45387 



NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
DVDFGAEGNNPDQHRFTYKIVKRNNYKKD-NLHPYEDDYHNPRLHKFVTEAAPIDMTSNM 



)rf lng-1. pep DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 



NGSTYSDRTKYPERVRIGSGRQFWRNDQDKGD— 



— QVAGAYHYLTAGNTHNQRGAGN 



orf lng-1 . pep GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
p45387 GYSYLGGDVRKAGEYGPLPIAGSKGDSGSPMFIYDAEKQKWLINGILREGNPFEGKENGF 



or f lng-1 . pep QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 
p45387 QLVRKSYF-DEIFERDLHTSLYTRAGNGVYTISGNDNGQGSITQKS— GIPSEIK— I 



QLFNVSLSETAREPVYHAA-GGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 
TLANMSLPLKEKDKVHNPRYDGPNIYSPRLNNGETLYFMDQKQGSLIFASDINQGAGGLY 



orf lng-1. pep FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 
p453 87 FEGNFTVSPNSNQTWQGAGIHVSENSTVTWKVNGVEHDRLSKIGKGTLHVQAKGENKG3I 



irf lng-1. pep SVGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
)4 5387 SVGDGKVILEQQADDQGNKQAFSEIGLVSGRGTVQLNDDKQFDTDKFYFGFRGGRLDLNG 



HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITT-TGNN-NNLDSKKEIAYNGWFG 
I I I : I : I I I I I I I I I I I I I I I : :: I I I I I I :: I : : I I I I : I I : I I I I I I I I I I 
HSLTFKRIQNTDEGAMIVNHNTTQAANVTITGNESIVLPNGNNINKLDYRKEIAYNGWFG 
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orf Ing-l . pep EKDATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 
p45387 ETDKNKHNGRLNLIYKPTTEDRTLLLSGGTNLKGDITQTKGKLFFSGRPTPHAYNHLNKR 



orf lng-1 . pep WSKMEGIPQGEIVWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGV 
p45387 WSEMEGIPQGEIVWDHDWINRTFKAENFQIKGGSAWSRNVSSIEGNWTVSNNANATFGV 



>rf lng-1 . pep APHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLN 
145387 VPNQQNTICTRSDWTGLTTCQKVDLTDTKVINSIPKTQINGSINLTDNATANVKGLAKLN 



790 



820 



830 



orf lng-1. pep GNLSAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNG 

p45387 GNVTL TNHSQFTLSNNATQIG 

750 760 770 

840 850 860 870 880 890 

orf lng-1 . pep SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 

p45387 NIRLSDNSTATVDNANLNGNVHLTDSAQFSLJCNSHFSHQIQGDKGTTVTLENATWTMPSD 
780 790 800 810 820 830 

900 910 920 930 940 950 

orf lng-1 . pep TELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRSLLSVTPPTSAESRFNTLT 

p4 5387 TTLQNLTLNNSTITLNSAY SASSNNTPRRRS LETETTPTSAEHRFNTLT 



850 



870 



p45387 



VNGKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYILSVRNTGKEPETLEQLTLVESKDN 



1020 1030 1040 1050 1060 1070 

orf lng-1. pep TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
I i I : : I : I I ! : t : ! i ! I I 1 I ! : I : : : I I I M ! I t I : I i I I I : I : I : : i : I II 
p45387 QPLSDKLKFTLENDHVDAGALRYKLVKNDGEFRLHNPIKEQELHNDLVRAEQAERTLEAK 



1260 1270 1280 1290 1300 1310 

MQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSG 

VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 
1180 1190 1200 1210 1220 1230 



wo 99/24578 



-376- 



PCT/IB98/01665 



orf lng-1 .pi 
P45387 



orf lng-1 .pep 
P45387 

1440 1450 1460 1469 

orf lng-1 .pep GVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRWX 

p4 53 8 7 GLKAEILHFQISAFI SKSQGSQLGKQQNVGVKLGYRW 

1360 1370 1380 1390 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 78 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 655>: 

1 . .AAGGTGTGGC AATTTGTCGA AGA.CCGCTG CGTGCCGTCG TGCCTGCCGA 

51 CAGTTTTGAA. CCGACCGCGC AAAAATTGAA CCTGTTTAAG GCGGGTGCGG 

101 CAACCATTTT GTTTTATGAA GATCAAAATG TCGTCAAAGG TTTGCAGGAG 

151 CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg ATCAGGCAAA 

201 CGCGATGGTG CAGTATGCCG TTTGGACGAC ACTTGCCGCG GTCGGCGTAG 

251 GTGCAAACCT GCAACATTAC AATCCCTTGC CCGATGCGGC GATTGCCAAA 

301 GCGTGGAATA TCCCCGAAAA CTGGTTGTTG CGCGCACAAA TGGTTATCGG 

351 CGGTATTGAA GGGGCGGCAG GTGAAAAGAC CTTTGAACCC GTTGCAGAAC 

4 01 GTTTGAAAGT GTTCGGCGCA TAA 

This corresponds to the amino acid sequence <SEQ ID 656; 0RF6>: 

1 ..KVWQFVEXPL RAWPADSFE PTAQKLNLFK AGAATILFYE DQNWKGLQE 
51 QFPAYAANFP WADQANAMV QYAWTTLAA VGVGANLQHY NPLPDAAIAK 
101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKVFGA * 

Further sequence analysis revealed a further partial DNA sequence <SEQ ID 657>: 

1 . . CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

251 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

301 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGTGAAAA 

351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence <SEQ ID 658; 0RF6-1>: 

1 ..LRAVVPADSF EPTAQKLNLF KAGAATILFY EDQNWKGLQ EQFPAYAANF 
51 PVWADQANAM VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 

101 LRAQMVIGGI EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

0RF6 shows 98.6% identity over a 140aa overlap with an ORF (0RF6a) from sfrain A of N. 
meningitidis: 



wo 99/24578 



-377- 



PCT/IB98/01665 



or f 6 . pep KVWQFVEXPLRAWPADSFEPTAQKLNLFK 

orf6a QIVEHAVLHTPSSFNSQSARWVLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFK 
40 50 60 70 80 90 

40 50 60 70 80 90 

orf 6 . pep AGAATILFYEDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 

orf 6a AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 
100 110 120 130 140 150 



100 110 120 130 140 

orf 6 . pep NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

orf 6a NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
160 170 180 190 200 

The complete length 0RF6a nucleotide sequence <SEQ ID 659> is: 



1 ATGACCCGTC AATCTCTGCA ACAGGCTGCC GAAAGCCGCC GTTCCATTTA 

51 TTCGTTAAAT AAAAATCTGC CCGTCGGCAA AGATGAAATC GTCCAAATCG 

101 TCGAACACGC CGTTTTGCAC ACACCTTCTT CGTTCAATTC CCAATCTGCC 

151 CGTGTGGTCG TGCTGTTTGG CGAAGAGCAT GATAAGGTGT GGCAATTTGT 

201 CGAAGACGCG CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG 

251 CGCAAAAATT GAACCTGTTT AAGGCGGGTG CGGCAACTAT TTTGTTTTAT 

301 GAAGATCAAA ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC 

351 CGCCAACTTT CCCGTTTGGG CGGACCAGGC GAACGCGATG GTGCAGTATG 

401 CCGTTTGGAC GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT 

451 TACAATCCCT TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA 

501 AAACTGGTTG TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG 

551 CAGGTGAAAA GACCTTTGAA CCAGTTGCAG AACGTTTGAA AGTGTTCGGC 

601 GCATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 660>; 



1 MTRQSLQQAA ESRRSIYSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 

51 RWVLFGEEH DKVWQFVEDA LRAVVPADSF EPTAQKLNLF KAGAATILFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAM VQYAVWTTLA AVGVGANLQH 

151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 

201 A* 



0RF6a and 0RF6-1 show 100.0% identity in 131 aa overlap: 

50 60 70 80 90 100 

orf 6a . pep TPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFKAGAATILFY 

orf 6-1 LRAWPADSFEPTAQKLNLFKAGAATILFY 

10 20 30 



110 120 130 140 150 160 

orf 6a . pep EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 

I ! I I I I I I I I I ! I I I I I I I I I I I I I I 1 ! I ] I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 6-1 EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 
40 50 60 70 80 90 

170 180 190 200 

orf 6a. pep KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

orf 6-1 KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
100 110 120 130 

Homology with a predicted ORF from N.gonorrhoeae 

0RF6 shows 95.7% identity over a 140aa overlap with a predicted ORF (0RF6ng) from 



N. gonorrhoeae: 
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orf6 pep KVWQFVEXPLRAWPADSFEPTAQKLNLFK 30 

I I I I I I I I I I I I I I I I I I I I I II I : I I I 
orf 6ng SNVSLDMSNPTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFK 64 

5 orf6 pep AGAATILFYEDQNVVKGLQEQFPAYAANFPWADQANAMVQYAVWTTLAAVGVGANLQHY 90 

M II I I I I II II II I II I I I I I I II I I I II II I I I I I I I M I II I I I II II I : II I I I II 
orf 6ng AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHY 124 

orf6 pep NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGA 140 

10 II I I I : II II I M I I I I I I M I I I I I I i I I II II II : I II II I II II I I I 

orf6ng NPLPDVAIAKAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGA 174 

The complete length 0RF6ng nucleotide sequence <SEQ ID 661 > was identified as: 

1 ATGGCCGTTG CGTCAAATGT CAGCTTGGAT ATGTCCAATC CTACGGTGTT 

15 51 ACGCATGGGA TTACCCTTAT ATATTGCGTC CCTAAGAAGG GGCGCAATAT 

101 ATAAGGTGTG GCAATTTGTC GAAGACGCGC TGCGTGCCGT CGTGCCTGCC 

151 GACAGTTTTG AACCGACCGC GCAAAAATTG AAGCTGTTTA AGGCGGGCGC 

201 GGCAACCATT TTGTTTTATG AAGATCAAAA TGTCGTCAAA GGTTTGCAGG 

251 AGCAGTTCCC TGCTTATGCC GCCAACTTTC CCGTTTGGGC GGACCAGGCG 

20 301 AACGCTATGG TACAGTATGC CGTCTGGACG ACACTTGCCG CGGTCGGTGC 

351 AGGTGCAAAT CTGCAACATT ACAACCCCTT GCCCGATGTG GCGATTGCTA 

4 01 AAGCGTGGAA TATTCCCGAA AACTGGCTGT TGCGCGCGCA AATGGTTATC 

451 GGTGGTATTG AAGGGGcggc aggtgaaaaa gtctttgaac CCGTTGCgga 

501 acgtttgAAA GTGTTCGGCG CATAA 

25 This encodes a protein having amino acid sequence <SEQ ID 662>: 

1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV EDALRAVVPA 
51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 
101 NAMVQYAVWT TLAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 
151 GGIEGAAGEK VFEPVAERLK VFGA* 

30 

0RF6ng and 0RF6-1 show 96.9% identity in 131 aa overlap: 

10 20 30 

or f 6-1 . pep LRAWPADSFEPTAQKLNLFKAGAATILFY 
35 orf6ng PTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFKAGAATILFY 



orf 6-1 pep EDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 
40 I I II I II I II II II I II I II I I II I I I I I I I II I II II II I II : I I II II II I I I I : I I I 

orf 6ng EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHYNPLPDVAIA 
80 90 100 110 120 130 

100 110 120 130 

45 orf 6-1. pep KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

orf 6ng KAWNIPENWLLRAQMVIGGIEGAAGEEWFEPVAERLKVFGAX 
140 150 160 170 

50 It is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 79 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 663> 

1 . . GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA ACTACCAAAT 

55 51 CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGGt CAATGCCAAC 

101 ACCGCCGCCT ATGAGCGCGT AGAAGTCGTG CGCGGCGTGG CGGGGCTGCT 

151 GGACGGCACG GGCGAGCCTT CCGCCACCGT CAATCTGGTG CGCAAACGCC 

201 TGACCCGCAA GCCATTGTTT GAAGTCCGCG CCGAAGCgGG CAACCGcAAA 
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251 CATTTCGGGC TGGACGCGGA CGTATCGGGC AGCCTGAACA CCGAAG.crC 

301 rCTGCGCgGC CGCCTGGTTT CCAcCTTCGG ACGCGGCGAC TCGTGGCGGC 

351 GGCGCGAACG CAGCCGskAT GCCGAACTCT ACGGCATTTT GGAATACGAC 

401 ATCGCACCGC AAACCCGCGT CCACGCArGC ATGGACTACC AGCAGGCGAA 

451 AGAAACCGCC GACGCGCCGC TCAGcTACGC CGTGTACGAC AGCCAAGGTT 

501 ATGCCACCGC CTTCGGCCCG AAAGACAACC CCGCCACAAA TTGGGCGAAC 

551 AGCCACCACC GTGCGCTCAA CCTGTTCGCC GGCATCGAAC ACCGCTTCAA 

601 CCAAGACTGG AAACTCAAAG CCGAATACGA CTAC. 

This corresponds to the amino acid sequence <SEQ ID 664; ORF23>: 

1 ..GYNYLFARGS RIANYQINGI PVADALADTG NANTAAYERV EWRGVAGLL 

51 DGTGEPSATV NLVRKRLTRK PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 

101 LRGRLVSTFG RGDSWRRRER SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 

151 ETADAPLSYA VYDSQGYATA FGPKDNPATN WANSHHRALN LFAGIEHRFN 

201 QDWKLKAEYD Y. . 

Further work revealed the complete nucleotide sequence <SEQ ID 665>: 

1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

4 01 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCTGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGACGCG 

601 GACGTATCGG GCAGCCTGAA CACCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCGGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTGAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATTTTGGGCG GACGATACAC CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CTCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAG AGCGTACCCG AACGCAGCTT 

1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 

1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 

1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 

2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence <SEQ ID 666; ORF23-l>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

201 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 
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251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYTRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with the ferric-pseudobactin receptor PupB of Pseudomonas putida (accession number P38047) 
ORF23 and PupB 

Orf23 5 

PupB 215 

Orf23 66 

PupB 27 4 

Orf23 126 

PupB 334 

Orf23 184 

PupB 392 

Homology with a 



protein show 32% aa identity in 205 aa overlap: 

FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRK 65 
++RG I NY+++G+P + L D + + A ++RVE+VRG GL+ G G PSAT+NL+RK 
WSRGFAIQNYEVDGVPTSTRL-DNYSQSMAMFDRVEIVRGATGLISGMGNPSATINLIRK 273 

RLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFXXXXXXXXXXXXXXAE 125 
R T + + EAGN +G DVSG L +RGR V+ + 

RPTAEAQASITGEAGNWDRYGTGFDVSGPLTETGNIRGRFVADYKTEKAWIDRYNQQSQL 333 

LYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYD--SQGYATAFGPKDNPATNWAN 183 
+YGI E+D++ T + Y + D+PL + S G T N A +W+ 

MYGITEFDLSEDTLLTVGFSY— LRSDIDSPLRSGLPTRFSTGERTNLKRSLNAAPDWSY 391 

SHHRALNLFAGIEHRFNQDWKLKAE 208 
+ H +FIE+ WKE 
NDHEQTSFFT3IEQQLGNGWSGKIE 416 

predicted ORF from N.meninsitidis (strain A") 



ORF23 shows 95.7% identity over a 21 laa overlap with an ORF (ORF23a) from strain A of A^. 
meningitidis: 



orf23.pep 
orf23a 



GYNYLFARGSRIANYQINGIPVADALADTG 
I I I I I M I I I I I I I I I I M I I I I 1! I I I I I 
QMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIPVADALADTG 



orf 23 . pep 
orf23a 



NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDAD 
NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTRKPLFEVRAEAGNRKHFGLGAD 



orf 2 3. pep 
orf23a 



VSGSLNTEXXLRGRLVSTFGRGDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAK 
I I I I I I : I : I I ! I I I I I I I I I I I I I : 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
VSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGILEYDIAPQTRVHAGMDYQQAK 



orf23 .pep 
OEf23a 



ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 1 M I I 1 : I I I I 1 I I I I I I I I I I I I I I I I I I I I 
ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRALNLFAGIEHRFNQDWKLKAEYD 



YTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTHSASVSLIGKYRLFGREHDLIA 



The complete length ORF23a nucleotide sequence <SEQ ID 667> is: 
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1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCAAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAGCG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

451 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA TGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTAAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATACTCGGCG GCAGATACAG CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 

1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 

1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 

1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 

2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This encodes a protein having amino acid sequence <SEQ LD 668>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTV3GTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKA LDRALLQATG 

101 TSRQIYG3DR AGYNYLFARG SRIANYQING IPVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEP3AT VNLVRKRPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSLNAEG TLRGRLV3TF GRGDSWRQRE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTR3RFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNL3LYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

ORF23a and ORF23-1 show 99.2% identity in 725 aa overlap; 

10 20 30 40 50 60 

or f 23a . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

orf23-l MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
10 20 30 40 50 60 
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)rf23a.pep 
)rf23-l 



PLGLPMTLREIPQSVSVITSQQMRDQNIKALDRALLQATGTSRQIYGSDRaGYNYLFARG 
PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 



)rf23a.pep 
)rf23-l 



SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 



orf23a.pep 
orf23-l 



KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGI 
KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 



irf23a.pep 
.rf23-l 



LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 
I I I I I I I I I I I I I I I I M I I I I I I I I I I ! I M M I I I I I I I I I M I I I I I I I I I I I I M I 
LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 



orf23a.pep 
orf23-l 



NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
NLFAGI EHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLS I DHNTAAT DL I PG YWHADPRTH 



orf23a.pep 
orf23-l 



SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I ! I I I I I I I 
SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 



)rf23a.pep 
)rf23-l 



FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYSRYRTGSYDSRTQGMTYVSANRFT 

I I I I I I I I I I i I I ! I M I I I I I I I I I I I I I I I ! I I I : I I ! I I I I I I I I I I I I I I ! I I I I I 
FAQTIPQYGTRRQIGGYLATRFR7SADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 



)rf23a.pep 

)rf23-l 



PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

I I I I I I I I I I 1 I I 1 I I I I I I I I I I I I I I I I I I I I 1 I i I I I I I I I I ! I I I I I I I I I I I I I 
PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 



AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQ3KTR 

i I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 
AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 



irf23a.pep 
.rf23-l 



DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQ3ETHTDPATLRIPNPAAK 



orf23a.pep 
orf23-l 



ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
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Homologv with a predicted ORF from N. gonorrhoeae 

ORF23 shows 93.4% identity over a 21 laa overlap with a predicted ORF (ORF23.ng) from N. 
gonorrhoeae: 



orf23.pep 


GYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLD 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 i 1 1 1 
SAVDACRIPGYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPD 


51 


orf23ng 


60 


orf 23 .pep 


GTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 


111 


orf23ng 


GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 


120 


orf23.pep 


GDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYDSQGYATAF 
Mill: II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 
GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 


171 


orf23ng 


180 


orf 23. pep 


GPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYDY 


211 


orf23ng 


GPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLPCAEYDYTRSRFRQPYGVAGVLSIDHS 


240 



The ORF23ng nucleotide sequence <SEQ ID 669> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 670>: 



1 SAVDACRIPG YNYLFARGSR lANYQINGIP VADALADTGN ANTAAYERVE 

51 WRGVAGLPD GTGEPSATVN LVRKHPTRKP LFEVRAEAGN RKHFGLGADV 

101 SGSLNAEGTL RGRLVSTFGR GDSWRQLERS RDAELYGILE YDIAPQTRVH 

151 AGMDYQQAKE TADAPLSYAV YDSQGYATAF GPKDNPATNW SNSRNRALNL 

201 FAGIEHRFNQ DWKLKAEYDY TRSRFRQPYG VAGVLSIDHS TAATDLIPGY 

251 WHADPRTHSA SMSLTGKYRL FGREHDLIAG INGYKYASNK YGERSIIPNA 

301 IPNAYEFSRT GAYPQPSSFA QTIPQYDTRR QIGGYLATRF RAADNLSLIL 

351 GGRYSRYRAG SYNSRTQGMT YVSANRFTPY TGIVFDLTGN LSLYGSYSSL 

401 FVPQLQKDEH GSYLKPVTGN NLEADIKGEW LEGRLNASAA VYRARKNNLA 

451 TAAGRDQSGN TYYRAANQAK THGWEIEVGG RITPEWQIQA GYSQSKPRDQ 

501 DGSRLNPDSV PERSFKLFTA YHLAPEAPSG RTIGAGVRRQ GETHTDPAAL 

551 RIPNPAAKAR AVANSRQKAY AVADIMARYR FNPRTELSLN VDNLFNKHYR 

601 TQPDRHSYGA LRTVNAAFTY RFK* 

Further work revealed the complete nucleotide sequence <SEQ ID 671>: 



1 ATGACACGCT TCAAATACTC CCTGCTTTTT GCCGCCCTGC TACCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CCGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CCGTTTCCGG CACGCACACC CCGTTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

451 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CCGGACGGCA CGGGCGAGCC 

501 TTCTGCCACC GTCAATCTGG TACGCAAACA CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCC GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA CGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCTCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CAGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CAAAAGACAA CCCCGCCACA AATTGGTCGA ACAGCCGCAA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATAGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GTAGCCGCTT CCGCCAGCCC TACGGTGTGG 

1001 CAGGCGTACT TTCCATCGAC CACAGCACTG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCcgatcc GCGCACCCAC AGCGCCAGCA TGTCATTGAC 

1101 CGGCAAATAC CgcctGTTCG GCCGCGAGCA CGATTTAATC GCGGGTATCA 

1151 ACGGCTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATTCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGCG CCTATCCGCA 

1251 GCCATCATCG TTTGCCCAAA CCATCCCGCA ATACGACACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATACTCGGCG GCAGATACAG CCGCTACCGC GCAGGCAGCT ACAACAGCCG 
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1501 
1551 
1601 
1651 
1701 
1751 



1901 
1951 
2001 
2051 



CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGGCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATcggTGC 
GCGCTCCGCA 
CCGCCAGAAA 
ATCCGCGCAC 
TACCGCACCC 
CGCGGCGTTT 



ATGACCTATG 
CGATCTGACC 
TCCCGCAATT 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACCGCCTACC 
GGGTGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGA 
GCCGCCGTGT 
CGACCAGAGC 
ACGGCTGGGA 
CAGGCAGGCT 
GAACCCCGAC 
ACTTAGCCCC 
CGGCAGGGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



CCGTTTCACC 
CGCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTAcCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAACCTGTT 
GGCGCACTGC 



CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTATCGCGC 
GGCGGCCGCA 
CAAACCCCGC 
AACGCAGCTT 
AGCGGCCGGA 
CGACCCAGCC 
TCGCCAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



This corresponds to the amino acid sequence <SEQ ID 672; ORF23ng-l>: 



1 MTRFKYSLLF AALLPVYAQA 



DGYTVSGTHT 
TSRQIYGSDR 
VEVVRGVAGL 
DVSGSLNAEG 
VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYSRYR 
SLFVPQLQKD 
LATAAGRDQS 
DQDGSRLNPD 
ALRIPNPAAK 
YRTQPDRHSY 



PFGLPMTLRE 
AGYNYLFARG 
PDGTGEPSAT 
TLRGRLVSTF 
KETADAPLSY 
NQDWKLKAEY 
SASMSLTGKY 
RTGAYPQPSS 
AGSYNSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
3VPERSFKLF 
ARAVAN3RQK 
GALRTVNAAF 



DVSVSDDPKP 
IPQSVSVITS 
SRIANYQING 
VNLVRKHPTR 
GRGDSWRQLE 
AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYDT 
MTYVSANRFT 
GNNLEADIKG 
AKTHGWEIEV 
TAYHLAPEAP 
AYAVADIMAR 
TYRFK* 



QESTELPTIT 
QQMRDQNIKT 
IPVADALADT 
KPLFEVRAEA 
RSRDAELYGI 
AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGRTIGAGVR 
YRFNPRTELS 



VTADRTASSN 
LDRALLQATG 
GNANTAAYER 
GNRKHFGLGA 
LEYDIAPQTR 
NWSNSRNRAL 
HSTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKPR 
RQGETHTDPA 
LNVDNLFNKH 



ORF23ng-l and ORF23-1 show 95.9% identity in 725 aa overlap: 



MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

I I I I 1 I I I I I I I I I I I i I I I I I I I I I I I i I I I I I I I I I I I i I 1 I I I I I I I I I I I I I I I I 
MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 



orf 23-1 . pep PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
I : I I I I I I I I I I I I I I I I I I I I I i I I I I I i I i i I I I I I I I I I I I I I I I I I I I I I I I I I 1 I 
orf23ng-l PFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 



orf 23-1 . pep SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 
orf23ng-l SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPDGTGEPSATVNLVRKHPTR 



KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 
I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 
KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGI 



LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 
LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWSNSRNRAL 



NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 
NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHSTAATDLIPGYWHADPRTH 
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SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
SASMSLTGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPSS 



FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 
FAQTIPQYDTRRQIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTYVSANRFT 



PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
PYTGIVFDLTGNLSLYGSYSSLFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNAS 



AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
AAVYRARKNNLATAAGRDQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGY3QSKPR 



DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
DQDGSRLNPDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAK 



ARAADN S RQKAYAVAD I MAR YR FN PI 



.SLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

ARAVANSRQKAYAVADIMARYRFNPRTEL3LNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
670 680 690 700 710 720 



orf 23-1 .pep 
orf23ng-l 



TYRFKX 
TYRFKX 



In addition, ORF23ng-l shows significant homology with an OMP from E.coli: 

spiP16869|FHUE_EC0LI OUTER-MEMBRANE RECEPTOR FOR FE ( III ) -COPROGEN , FE(i: 
FERRIOXAMINE B AND FE ( III ) -RHODOTRULIC ACID PRECURSOR >gi 1 1 551542 I gnl I PID | dlOi: 
(D907 45) Outer membrane protein FhuE precursor [Escherichia c< 
>gi|1651545|gnl|PID|dl015405 (D90746) Outer membrane protein FhuE precu: 
[Escherichia coli] >gi 1 1787344 (AE000210) outer-membrane receptor for Fe(i: 
coprogen, Fe (III) -ferrioxamine B and Fe (III) -rhodotrulic acid precu: 
[Escherichia coli] Length =72 9 
Score = 332 bits (843), Expect = 3e-90 

Identities = 228/717 (31%), Positives = 350/717 (48%), Gaps = 60/717 (8%) 

Query: 38 TITVTADRTA3SN— DGYTVSGTHTPFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRAL 95 

T+ V TA + + Y+V+ T + MT R+IPQSV++++ Q+M DQ ++TL + 

Sbjct: 43 TVIVEGSATAPDDGENDYSVTSTSAGTKMQMTQRDIPQSVTIVSQQRMEDQQLQTLGEVM 102 

Query: 96 LQATGTSRQIYGSDRAGYNYLFARG3RIANYQINGIP VADALADTGNANTAA 147 

G S+ SDRA Y ++RG +1 NY ++GIP + DAL+D A 
Sbjct: 103 ENTLGISKSQADSDRALY YSRGFQIDNYMVDGIPTYFESRWNLGDALSDM AL 154 

Query: 148 YERVEWRGVAGLPDGTGEPSATVNLVRKHPTRKPLF-EVRAEAGNRKHFGLGADVSGSL 206 

+ERVEWRG GL GTG PSA +N+VRKH T + +V AE G+ AD+ L 

Sbjct: 155 FERVEWRGATGLMTGTGNPSAAINMVRKHATSREFKGDVSAEYGSWNKERYVADLQSPL 214 

Query: 207 NAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADA 266 

+G +R R+V + D3W 3 GI++ D+ T + AG +YQ+ + 

Sbjct: 215 TEDGKIRARIVGGYQNNDSWLDRYNSEKTFFSGIVDADLGDLTTLSAGYEYQRIDVNSPT 274 

Query: 2 67 PLSYAVYDSQGYATAFGPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSR 32 6 
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327 F— RQPYGVAGVLSIDHSTAA— TDLIPGY WHADPRTHSA-SMSLTGKYRLFG 374 

F + Y A V D ++ PG+ W++ R A + G Y LFG 

335 FDSKMMYVDAYVNKADGMLVGPYSNYGPGFDYVGGTGWNSGKRKVDALDLFADGSYELFG 394 

37 5 REHDLIAGINGYKYASNKYGER— SIIPNAIPNAYEFSRTGAYPQPSSFAQTIPQYDTRR 432 

R+H+L+ G Y +N+Y +1 P+ I + Y F+ G +PQ Q++ Q DT 

395 RQHNLMFG-GSYSKQNNRYF3SWANIFPDEIGSFYNFN — GNFPQTDWSPQSLAQDDTTH 451 

4 33 QIGGYLATRFRAADNLSLILGGRYSRYRAG3YNSRTQGMTY-VSANRFTPYTGIVFDXXX 4 91 

Y ATR AD L LILG RY+ +R + +TY + N TPY G+VFD 

452 MKSLYAATRVTLADPLHLILGARYTNWRVDT LTYSMEKNHTTPYAGLVFDIND 504 

4 92 XXXXXXXXXXXFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNASAAVYRARKNNL 551 

F PQ +D G YL P+TGNN E +K +W+ RL + A++R ++N+ 
505 NWSTYASYTSIFQPQNDRDSSGKYLAPITGNNYELGLKSDWMNSRLTTTLAIFRIEQDNV 564 

552 ATAAGR DQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPRDQDGSRLN 608 

A + G +G T Y+A + + G E E+ G IT WQ+ G ++ D +G+ +N 

565 AQSTGTPIPGSNGETAYKAVDGTVSKGVEFELNGAITDNWQLTFGATRYIAEDNEGNAVN 624 

609 PDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAKARAVANSR 668 

P ++P + K+FT+Y L P P T+G GV Q +TD P RA 
625 P-NLPRTTVKMFTSYRL-PVMPE-LTVGGGVNWQNRVYTDTV TPYGTFRA E 672 

669 QKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 724 

Q +YA+ D+ RY+ L NV+NLF+K Y T + YG R + TY+F 

673 QGSYALVDLFTRYQVTKNFSLQGNVNNLFDKTYDTNVEGSIVYGTPRNFSITGTYQF 72 9 



Based on this analysis, it was predicted that these proteins from N. meningitidis and N.gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF23-1 (77.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
35 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
15A shows the results of affinity purification of the His- fusion protein, and Figure 15B shows the 
results of expression of the GST-flision in E.coli. Purified His-flision protein was used to immunise 
mice, whose sera were used for Western blot (Figure 15C) and for ELISA (positive result). These 
experiments confirm that ORF23-1 is a surface-exposed protein, and that it is a useful immunogen. 



40 Example 80 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 673>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

45 151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 

2 01 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 

351 TnAGTCGCCG ACGGGG . . 

50 This corresponds to the amino acid sequence <SEQ ID 674; ORF24>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 

51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI XSRMRATXSP TG . . 
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Further work revealed the complete nucleotide sequence <SEQ ID 675>: 



RTGCGCACGG 
GGCAATGATG 
TCATATCCAA 
AGCGTCAGCA 
AACGGGGATA 
TGCCGCCTTT 
CCGTGCGTAC 
TGAGTCGCCG 
ACGGGATATT 
CGGGTAATTT 
TGTCGTTGCA 
ATACGCCGAC 
CCCGCCATAA 
AGCGCAGCCG 
CGCCCGCCAG 
ATATTGATGG 
GGAGCGGATT 
CGGAAAAACC 
AAAGTTTGCG 



CAGTGGTTTT 
CCGGAAATGG 
GCCGACCGAA 
CGCCTGCTTC 
AACGCGCCAC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCGGGGG 
CAGCATTTTT 
TGAAAGCAGT 
TCTGAATTTT 
ATTGATAACG 
ACGGGTTGTC 
AAACCTTCGG 
CTTGACCGCA 
AGCTGCACAC 
AACACCTCAT 
GCCGATAAAA 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAAACGGCGG 
GGCGGCGGCA 
TCAAACCCCC 
TCGTTCAGCA 
CAAGCCCATT 
TCGGCGCCAG 
GAGGCTTCGC 
TTTCTTCACT 
CCAACGCGGC 
GCATCCGCTT 
TTCCACCGCG 
GCGTGATTTC 
TCCATATTGA 
AATATCGGTA 
CCGAAGGCGA 
GACACACCGA 
GTAA 



ATGCCGATGG 
CGTGTCGCCG 
TCATGGCTTC 
ATCATACCTT 
GACCGCGCTG 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAGTCG 
GGCCGATGAG 
ACTTCCGCAA 
TTTTACGACA 
CGCCCGAACC 
TTGCAGAACA 
CGCCGTGCGT 
TACCGGCACG 
GTCTTCATCG 
CATCCCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTGTCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGTGCCAC 
AGAATACCAA 
TTCGCCCACG 
CTTCGGTCAA 
CCTGGGCCGG 
ATGAAACGCG 
CGACAATTTT 
TTGACGGTTT 
CGTACTGCCG 
CTTCGGGAAT 
TGCACCAACG 
AGCTTTATCC 



This corresponds to the amino acid sequence <SEQ ID 676; ORF24-l>: 

1 MRTAWLLLI MPMAASSAM M PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 
151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 
201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LTVSPASLTA SILI PARVLP 
251 ILMELHTISV VFIA SGMERI NTSSEGDIPF CTNAEKPPIK DTPMALAALS 
301 KVCATLT* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF24 shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf24a.pep MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I : I I I I I : I I I I I I I I I 
orf24 MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 



i . pep IIPSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 



TAGVGAS DKSRI PNGI FS I FEASRPMS S PTRVI LKAVFFTT SAT S VN WASEFSNAAFTT 

I 1 I I ! I ! I i I I I I I I I I I ! I I i I I I I M I I I ! I I I I I I I I ! I I I I I I I I I I I I I I I I I I I 

TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 



PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 
PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 



SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
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orf24i 
orf24 



The complete length ORP24a nucleotide sequence <SEQ ED 677> is 



ATGCGCACGG 
GGCAATGATG 
TCATATCCAA 
AACGTCAGCA 
NACGGGGATA 
TGCCGCCCTT 
CCGTGCGTAC 
CGAGTCGCCG 
ACGGGATATT 
CGGGTAATTT 
TGTCGTTGCA 
ATACGCCGAC 
CCCGCCATAN 
GGCGCAGCCG 
CGCCCGCCAG 
ATATTGATGG 
GGAACGGATN 
CGGAAAAGCC 
AAAGTTTGCG 



CAGTGGTTTT 
CCGGAAATGG 
NCCGACCGAA 
CGCCTGCTTC 
AACGCGCCAC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCAGGGG 
CAGCATTTTT 
TGAAGGCGGT 
TCCGAATTTT 
ATTAATCACA 
ACGGGTTGTC 
AAACCTTCTA 
TCTGACCGCG 
AGCTGCACAC 
AACACCTCGT 
GCCAATAAAA 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAAACGGCGG 
GGCGGCGGCA 
TCAAACCGCC 
TCGTTCAGCA 
CAAACCCATT 
TCGGTGCCAG 
GAGGCTTCGC 
TTTCTTCACA 
CCAACGCGGC 
GCATCCGCTT 
TTCCNCCGCG 
GTGTGATTTC 
TCCATATTGA 
GATATCAGTA 
CAGAAGGCGA 
GACACGCCGA 
GTAA 



ATGCCGATGG 
TGTGTCGCCG 
TCATCGCTTC 
ATCATACCTT 
AACCGCGCTC 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAGTCG 
GGCCGATGAG 
ACTTCGGCAA 
TTTTACGACA 
CGCCTGAGCC 
TTGCAGAACA 
ANCCGTGCGT 
TACCGGCGCG 
GTCTTCATCG 
CATACCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTATCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGCGCCAC 
AGAATACCAA 
TTCGCCCACG 
CTTCGGTCAA 
CCCGGGCCGG 
GTGAAACGCG 
CGACGATTTT 
TTGATGGTTT 
CGTACTGCCG 
CTTCGGGAAT 
TGCACCAGCG 
AGCCTTATCC 



This encodes a protein having amino acid sequence <SEQ ID 678>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISXPTE QTAVIASSLS 

51 NVSTPASAAA IIPSSSXTGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAVV 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

201 PAIXGLSSXA LQNTTILAQP KPSSVISXVR LMVSPASLTA SILIPARVLP 

251 ILMELHTISV VFIASGMERX NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

It should be noted that this protein includes a stop codon at position 198. 



ORF24a and ORF24-1 show 96.4% identity in 307 aa overlap: 

10 20 30 40 50 60 

orf24a.pep MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 

or f 2 4 -1 MRTAWLLLIMPMAAS SAMMPEMVCAGVS PGTAI I SKPTEQTAVMAS SLSSVSTPASAAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf24a.pep IIPSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf24-l IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPISSRMRATESP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf24a.pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNVVASEFSNAAFTT 

orf24-l TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTS.ATSVNWASEFSNAAFTT 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf24a.pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 

orf24-l PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf24a.pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 

orf24-l SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
250 260 270 280 290 300 
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KVCATLTX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF24 shows 96.7% identity over a 121 aa overlap with a predicted ORF (ORF24ng) from 
N. gonorrhoeae: 

orf24 .pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLS3VSTPASAAA 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I : I I I I I I I 
orf24ng MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 50 

orf24 .pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPIXSRMRATXSP 120 

orf24ng IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 12 0 



orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 

The complete length ORF24ng nucleotide sequence <SEQ ID 679> is: 



ATGCGCACGG 
GGCGATGATG 
TCATGTCCAA 
AGCGTCAACA 
AACGGGGATA 
TGCCGCCCTT 
CCGTGCGTAC 
CGAGTCGCCG 
ACGGGATATT 
CGGGTGATTT 
GCTGACCGCG 
ATACGCCGAC 
CCCGCCATAA 
GGCGCAGCCG 
CGCCTGCCAG 
ATATTGATGG 
■ GGAACGGATC 
CGGAAAAGCC 
AAAGTCTGCG 



CGGTGGTTTT 
CCGGAAATGG 
ACCAACGGAG 
CGCCTGCCTC 
AACGCGCCGC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCGGGGG 
CAGCATTTTT 
TGAAAGCGGT 
TCCGAATTTT 
ATTAATCACA 
ACGGATTGTC 
AAACCTTCGG 
CTTGACCGCA 
AGCTGCACAC 
AACACCTCAT 
GCCGATAAAG 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAGACGGCGG 
GGCGGCGGCA 
TCAAACCGCC 
TCGTTCAGCA 
CAAGCCCATT 
TCGGTGCCAG 
GAGGCTTCGC 
TTTCTTCACG 
CCAGCGCGGC 
GCATCCGCTT 
TTCCACCGCG 
GTGTGATTTC 
TCCATATTGA 
GATATCGGTA 
CCGAAGGCGA 
GACACGCCGA 
ATAA 



ATGCCGATGG 
CGTGTCGCCG 
TCATGGCTTC 
ATCATACCTT 
GACCGCGCTG 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAATCG 
GACCGATGAG 
ACTTCGGCGA 
TTTGACCACG 
CGCCCGAGCC 
TTGCAGAACA 
AGCCGTGCGT 
TACCGGCACG 
GTTTTCATCG 
CATACCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTGTCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGCGCCAC 
AGAATGCCGA 
TTCGCCCACG 
CCTCGGTCAG 
CCTGGACCGG 
GTGGAACGCA 
CGACGATTTT 
TTGATGGTTT 
CGTGCTGCCG 
CTTCGGGAAC 
TGCACCAGCG 
TGCCTTGTCC 



This encodes a protein having amino acid sequence <SEQ ID 680>: 



1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 

51 SVNTPASAAA IIP3SSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVRLTA SEFSSAALTT PGPDTPTLIT ASASPEPWNA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LMVSPASLTA SILI PARVLP 

251 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

ORF24ng and ORF24-1 show 96.1% identity in 307 aa overlap: 

10 20 30 40 50 60 

orf24-l.pep MRTAWLLLIMPMAAS SAlfflPEMVCAGVS PGTAI I SKPTEQTAVMA3 SLS SVST PASAAA 

or f 2 4 ng MRTAWLLLIMPMAAS SAMMPEMVCAGVS PGTAIMSKPTEQTAVMAS SLS S VNT PASAAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 24-1 . pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf24ng IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
70 80 90 100 110 120 
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TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 



PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 
PGPDTPTLITASASPEPWNAPAINGLSSTALQNTTILAQPKPSGVISAVRLMVSPASLTA 



SILIPARVLPILMELHTISVVFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
SILIPARVLPILMELHTISVVFIASGTERINTSSEGDIPFCTSAEKPPIKDTPMALAALS 



KVCATLTX 
KVCATLTX 



Based on this analysis, including the presence of a putative leader sequence (first 1 8 aa - double- 
underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 81 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 68 1>: 

1 ..ACCGACGTGC AAAAAGAGTT GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 
51 AAAAATCAGC AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 
101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 
151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

This corresponds to the amino acid sequence <SEQ ID 682; ORF25>: 

1 ..TDVQKELVGE QRKWAQEKIS NCRQAAAQAD RQEYAEYLKL QCDTRMTRER 
51 IQYLRGYSID * 

Further work revealed the complete nucleotide sequence <SEQ ID 683>: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAAGGCAT ACGCGGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGT TGTACGGGGA 

351 AACTGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGTCAAAGAC 

451 GGTCAGACGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG CTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT GAAAAAAGAA GACGCGGTCA GGATTTTGAG CGGAAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACGCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 
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This corresponds to the amino acid sequence <SEQ ID 684; ORF25-l>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQGIRGN IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQTAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRILSGKA 

201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF25 shows 98.3% identity over a 60aa overlap with an ORF (ORF25a) from sfrain A oiN. 
meningitidis: 

10 20 30 

orf25 pep TDVQKELVGEQRKWAQEKISNCRQAAAQAD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 25a VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNCRQAAAQAD 
250 260 270 280 290 300 

40 50 50 

orf25 .pep RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

orf25a RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 

The complete length ORF25a nucleotide sequence <SEQ ID 685> is: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAANGCAT ACGCNGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACNG CANGCAGTTT GTCGATGCCG ACNAAATTAT 

201 CGCCGCCGCC TANGNTNNGN NGNTNTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTNTCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGC TGTACGGGGA 

351 AACCGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTACC CGTCAAAGAC 

451 GGTCAGANGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT AAAAAAAGAA GACGCGGTCA GGATTNTGAG CNGANAAGCC 

601 CGTGAANAAG AACCGTCCAA ANCCNNGCCC GAAGACATTT TGGAACATAA 

651 TGCCGCCGGA GGGGATGCAG ACGTACCCCA AGCCGGAGAA GACGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGN GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence <SEQ ID 686>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQXIRXN IQETLTQEAR 

51 SFAREDXXQF VDADXIIAAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRIXSXXA 

201 REXEPSKXXP EDILEHNAAG GDADVPQAGE DAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a and ORF25-1 show 93.5% identity in 338 aa overlap: 

10 20 30 40 50 60 

or f 2 5a. pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQXIRXNIQETLTQEARSFAREDXXQF 

orf25-l MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 
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70 80 90 100 110 120 

orf 25a . pep VDADXIIAAAXXXXXSLEHASETQEGGRTFCXADLNITVPSETLADAKANSPLLYGETAL 

orf25-l VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 25a . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSAALLPYGVKSIV 
I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I ! I I I I 
orf 25-1 SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 25a. pep MIDGKAVKKEDAVRIXSXXAREXEPSKXXPEDILEHNAAGGDADVPQAGEDAPEPEILHP 

orf 25-1 MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
190 200 210 220 230 240 



250 260 270 280 290 300 

orf 25a . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 25-1 DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 
250 260 270 280 290 300 



310 320 330 339 

or f 2 5a . pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYS IDX 

or f 2 5- 1 RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYS IDX 

310 320 330 



Homology with a predicted ORF from N. gonorrhoeae 

ORF25 shows 100% identity over a 60aa overlap with a predicted ORF (ORF25ng) from 
N.gonorrhoeae: 



or f 25 . pep TDVQKELVGEQRKWAQEKISNCRQAAAQAD 

I I I I I I I I I I I I I I I I I I I I I I Ill 

orf25ng VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNCRQAAAQAD 

orf 25. pep RQEYAEYLKLQCDTRMTRERIQYLRGYSID 60 

orf25ng RQEYAEYLKLQCDTRMTRERIQYLRGYSID 338 

The complete length ORF25ng nucleotide sequence <SEQ ID 687> is: 



1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCAGCGTG 

51 CGGCAGGGAA GAACCGCCCA AGGCGTTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAGGACAT ACGCGGCAGT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CGAGGCAAAC AGCCCCCTGC TGTATGGGGA 

351 AACGTCTTTG GCAGACATCG TGCAGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGCCAAAGAC 

451 GCTCGGACGG CATTTATCGA CAACACGGTC GGTATGGCGA CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT GACAAAAGAA GACGCGGTCA GGGTTTTGAG CGGCAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACCCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCACCCG 

7 01 AACCCGAAAT CCTGCATCCC GACGACGTCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AACGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAGCGC AAGTGGGCGC AGGAAAAAAT CAGcaactgc 

901 cgACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTCCAATGC GACACGCGGA TGACGCGCGA ACggaTACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 



This encodes a protein having amino acid sequence <SEQ ID 688>: 
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MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 

SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 

ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 

REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDVERADTVT 

VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 
RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 



ORF25ng and ORF25-1 show 95.9% identity in 338 aa overlap: 



orf 25-1 . pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 
orf25ng MYRKLIALPFALLLAACGREEPPKALECANPAVLQDIRGSIQETLTQEARSFAREDGRQF 



VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 
VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAEANSPLLYGETSL 



SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 
ADIVQQKTGGNVEFKDGVLTAAVRFLPAKDARTAFIDNTVGMATQTLSAALLPYGVKSIV 



MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
MIDGECAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 



DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 
DDVERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 



RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 



45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF25-1 (37kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
50 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
16A shows the results of affinity purification of the GST-flision protein, and Figure 16B shows the 
results of expression of the His-fiision in E.coli. Purified His-fiision protein was used to immunise 
mice, whose sera were used for Western blot (Figure 16C), ELISA (positive result), and FACS 
analysis (Figure 16D). These experiments confirm that ORF25-1 is a surface-exposed protein, and 



55 that it is a useful immunogen. 
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Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25-1. 



Example 82 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 689> 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGwysGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGsyGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CkGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA T 

// 

851 AC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CTTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAA. . 

This corresponds to the amino acid sequence <SEQ ID 690; ORF26>: 



// 

251 TSLV 

301 FGGTCGVFAV VLCTLGTIKT ADYPKAVWQG AKSMFGAIAI LILAWLISTV 
351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL A3VMAFATGT SWGTFGIMLP 
401 lAAAMAVKVE PALIIPCMSA VMAGAVCGDH C3PISDTTIL SSTGARCNHI 
451 DHVTSQLPYA LTVAAAAASG YLALGLTK3A LLGFGTTGIV LAVLIFLLKD 
501 KK. . 

Further work revealed the complete nucleotide sequence <SEQ ID 691>: 



1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

401 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCACCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCTC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTTT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGCT 

751 ACCAAAGGTC GTGTTTACGC ACTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAACACGG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CCTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 
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1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This corresponds to the amino acid sequence <SEQ ID 692; ORF26-l>: 

1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKT ADYPKAVWQG AK5M FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 lAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRANA* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical transmembrane protein HI 1 586 of H. influenzae (accession number P44263) 
ORF26 and HI1586 show 53% and 49% amino acid identity in 97 and 221 aa overlap at the 
N-terminus and C-terminus, respectively: 

Orf2 6 1 MQLIDYSHSFFSVVPPFLALALAVITRRVXXXXXXXXXXXVAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

HI1586 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSL3AGIIIGSLMLSDWQIGSAFNYLVKNV 73 

Orf2 6 61 VGLAWSDXDWSLGKPKILVFXILLGIFT3LLTY3G3N 97 

V L ++D + + I++F +LLG+ T+LLT 3GSN 

HI1586 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTV3GSN 109 

// 

Orf2 6 86 IFTSLLTYSGS--NTSLVFGGTCGVFAWLCTL--GTIKTADYPKAVWQGAKSMFGXXXX 141 

+F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
HI1586 299 VFSVLGTFENTWGTSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAI 358 

Orf26 142 XXXXXXXSTVVGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLP 
HI15B6 359 LFFAWTINKIVGDMQTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLP 418 

Orf26 202 IAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPI3DTTILSSTGARCNHIDHVTSQXXXX 261 

lAAAMA P L++PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
HI1586 419 lAAAMAANAAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYA 478 

Orf26 262 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 

S L GF T + L V+IF +K + 
HI1586 479 ATVATATSIGYIWGFTYSGLAGFAATAVSLIVIIFAVKKR 519 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF26 shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) from sfrain A of TV. 
meningitidis: 

10 20 30 40 50 60 

or f 2 6 . pep MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILXXVAFLV GGNPVDGLTHLKDMV 

I I I 1 I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I ! I I I 1 I I I I I I I I I I I I 

orf 2 6a MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILVGVAFLV GGNPVDGLTHLKDMV 
10 20 30 40 50 60 

70 80 90 99 

orf 26. pep VGLAMSDXDWSLGKPK ILVFXILLGIFTSLLTY SGSNXX 

I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I 
orf 2 6a VGLAWSDGDWSLGKPK XLVFLILLGIFTSLLTY SGSNQAFADWAKRHIKNR RGAKMLTAC 
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or f 2 6. pep 

or f 2 6a LVFVTFID DYFH3LAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMP VSSWGASIIA 
130 140 150 160 170 180 



orf2 6.pep 

orf 2 6a TLAGLLV TYKITEYTPMGTFVAMSLMNYYA LFALIMVFWAWFSFDI GSMARFEQAALNE 
190 200 210 220 230 240 



I I I I 

AHDETAVSDGSWGRVYA LIIPVLALIASTVSAMI YTGAQASETFSILGAFENTDVNTSLV 
250 260 270 280 290 300 

120 130 140 150 160 170 

FGGTCGVFAWLCTL GTIKTADYPKAVWQGAKSM FGAIAILILAWLiaTW GEMHTGDYL 

FGGTCGVLAWLCTL GT IKI ADYPKAWQGAKSM FGAIAI L ILAWL I S T VV GEMHTGDYL 
310 320 330 340 350 360 

180 190 200 210 220 230 

STLVAGNIHP GFLPVILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV EP ALIIPCMSA 

STLVAGNIHP GFLXVILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV DP SLIIPCMSA 
370 380 390 400 410 420 



240 250 260 270 280 290 

VMAGAVCG DHCSPISDTTILSSTGARCNHIDHVTSQLPY ALTVAAAAASGYLALGL TKSA 



430 440 450 460 470 480 

300 310 
LLGFGTTGIVLAVLIFL LKDKK 

LLGFGXTGIVLAVLIFL LKDKKRANAX 
490 500 

The complete length ORF26a nucleotide sequence <SEQ ID 693> is: 



1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAANT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

401 TCGCCGTCGG TGCGNTTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCGC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGGC 

7 51 AGCTGGGGCA GGGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 
801 CTCAACGGTT TCCGCCATGA TCTACACCGG TGCACAGGCA AGCGAAACCT 

8 51 TCAGCATTTT GGGTGCATTT GAAAATACGG ACGTGAACAC TTCGCTGGTA 
901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGCTCGGCAC 
951 GATTAAAATC GCCGATTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTTG CCTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACAGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGN CCGTCATCCT TTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT CATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAT CCCTCACTGA TTATCCCGTG 

1251 TATGTCCGCC GTGATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 



orf 26. pep 
orf 2 6a 

orf 2 6. pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 
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1351 GACCACGTTA CNTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

14 01 CGCATCGGGN TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGTT 

14 51 TTGGCANGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This encodes a protein having amino acid sequence <SEQ ID 694>: 

1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKPK XLVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAEOXILTAC LVFVTFID DY FHSLAVGAXA RPVTDKFKVS 

151 RAKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDG 

251 SWGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKI ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLXVILFLL ASVMAFA TGT SW GTFGIMLP 

401 lAAAMAVKV D P SLIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGXTGIV LAVLIFL LKD 

501 KKRANA* 

ORF26a and ORF26-1 show 97.8% identity in 506 aa overlap: 

10 20 30 40 50 60 

MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

I I I I I I I I I M M I I I I M 1 I I 1 I I I i I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
10 20 30 40 50 60 



orf2 6a.pep 
orf26-l 



VGLAWSDGDWSLGKPKXLVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

M I I I I I I I I I I I 1 I I I I ! I I I I I I I I I I I I I I I I I 1 I i I I I I I I I I I I I I I I I I I ! I 
VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 



LVFVTFIDDYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMPVSSWGASIIA 

I I I I I I I I I I I I I I I I I I I I I !! I I I I I I I : I I I I I I I I I ! I I I I I I I I I I I I I I i I I 
LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 



TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! 

TLAGLLVT YKITEYTPMGT FVAMS LMNYYALFAL IbWFWAWFS FD IG SMARFEQAALNE 



AHDETAVS DGSWGRVYAL 1 1 PVLALI ASTVSAMI YTGAQASET FS I LGAFENT DVNTS LV 
AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 



FGGTCGVLAWLCTLGTIKIADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 
FGGTCGVLAWLCTLGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 



STLVAGNIHPGFLXVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVDPSLIIPCMSA 
STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 



VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VMAGAVCGDHCS PI S DTT I LS STGARCNHI DHVT SQLPYALTVAAAAASGYLALGLTKSA 



orf26a.pep 



490 500 
LLGFGXTGIVLAVLIFLLKDKKRANAX 
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orf26-l LLGFGTTGIVLAVLIFLLKDKKRANAX 
490 500 

Homology with a predicted ORF from N.sonorrhoeae 

ORF26 shows 94.8% and 99% identity in 97 and 206 aa overlap at the N-terminus and C-ten 
respectively, with a predicted ORF (ORF26ng) from N. gonorrhoeae: 

orf2 6.pep MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILXXVAFLVGGNPVDGLTHLKDMV 
I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf26ng MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

orf26.pep VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 

orf2 6ng VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 
// 

orf2 6.pep TSLVFGGTCGVFAWLCTLGTIKTADYPKA 
I I I I I I I I I I I : I I I I I I : I I I I I I I I I I I 
orf2 6ng ASTVSAMIYTGAQASETFSILGAFENTDVNTSLVFGGTCGVLAVVLCTFGTIKTADYPKA 

orf 2 6 . pep VWQGAKSMFGAIAILILAWLISTVVGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf2 6ng VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 

orf 26. pep ATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 

orf2 6ng ATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 

orf 2 6. pep CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKK 

orf2 6ng CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKKRADV 

The complete length ORF26ng nucleotide sequence <SEQ ID 695> is: 

1 ATGCAGCTGA TTGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TTTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGGCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGCATTTT CACTTCACTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGTGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGCC 

4 01 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCTCGC CCATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GATTGCTCGT TACCTACAAA ATTACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCG CTGTTTGCCC TGATTATGGT 

651 ATTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGAtg gCGCGTTTCG 

701 AACAGGCTGC GTTGAACGAA gcocaggacg aaaccgccgo tTCAGACgCT 

751 ACCAAAGGTC GTGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAATACCG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGTTCGGCAC 

951 GATTAAAACC GCCGATTATC CCAAAGCCGT GTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CCTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACGGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTAtcccGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGTTCGCCCA 

1301 TCTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTATGCC CTGACGGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC CGGTATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCGACGTTTG A 

This encodes a protein having amino acid sequence <SEQ ID 696>: 
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MQLIDYSHSF FSVVPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

DGLTHLKDMV VGLAWADGDW SLGKPK ILVF LILLGIFTSL LTY SGSNQAF 

ADWAKRHIKN R CGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

RAKLAYILDS TASPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AQDETAA3DA 

TKGRVYA LII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

FGGTCGVLAV VLCTF GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

lAAAMAVKVE PALIIPCMSA VMAGAVCGDH CSPISDTTIL SSTGARCNHI 



ORF26ng and ORF26-1 show 98.4% identity in 505 aa overlap: 



orf26-l.pep MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
orf26ng MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 



VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 



130 140 150 150 170 180 

orf2 6-l.pep LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 

orf26ng LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf26-l.pep TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

orf26ng TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFVVAWFSFDIGSMARFEQAALNE 
190 200 210 220 230 240 

250 260 270 280 290 300 

AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf2 6-l.pep FGGTCGVLAWLCTLGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 
I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf26ng FGGTCGVLAVVLCTFGTIKTADYPKAVWQGAKSMFGAIAILILAWLI3TWGEMHTGDYL 
310 320 330 340 350 360 



orf26-l .pep 
orf26ng 



370 380 390 400 410 420 

orf 26-1. pep STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

orf26ng STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 
370 380 390 400 410 420 



430 440 450 460 470 480 

orf 2 6-1 . pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

orf26ng VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
430 440 450 460 470 480 



LLGFGTTGIVLAVLIFLLKDKKRANAX 
I I I I I I I I I I I I I I I I I I I I I I I I : : 
LLGFGTTGIVLAVLI FLLKDKKRADVX 



In addition, ORF26 ng shows significant homology to a hypothetical H.influenzae protein: 
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spl P44263 I YF86_HAEIN HYPOTHETICAL PROTEIN HI1586 >gi I 1074850 I pir I I C64037 
hypothetical 

protein HI1586 - Haemophilus influenzae (strain Rd KW20) >gi 11574427 (U32832) H. 
influenzae predicted coding region HI1586 [Haemophilus influenzae] Length = 519 
Score = 538 bits (1370), Expect = e-152 

Identities = 280/507 (55%), Positives = 346/507 (68%), Gaps = 7/507 (1%) 

Query: 1 MQLIDYSHSFFSWPPFLALALAVITRRXXXXXXXXXXXXXAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRR L +L V 

Sbjct: 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

Query: 61 VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 120 

V L +ADG+ + I++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 

Sbjct: 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSNRAFAEWAQSRIKGRRGAKLLAAS 132 

Query: 121 LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 180 

LVFVTFIDDYFHSLAVGAIARPVTD+FKVSRAKLAYILDSTA+PMCV+MPVSSWGA II 
Sbjct: 133 LVFVTFIDDYFHSLAVGAIARPVTDRFKVSRAKLAYILDSTAAPMCVMMPVSSWGAYIIT 192 

Query: 181 TLAGLLVTYKITEYTPMGTFVAMSLMNYyALFALI^4VFWAWFSFDIGSMARFEQAALNE 240 

+ GLL TY ITEYTP+G FVAMS MN+YA+F++IMVF VA+FSFDI SM R E+ AL 
Sbjct: 193 LIGGLLATYSITEYTPIGAFVAMSSMNFYAIFSIIMVFFVAYFSFDIASMVRHEKLALKN 252 

Query: 241 AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQA SETFSILGAFENTDVN 296 

+D+ TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 

Sbjct: 253 TEDQLEEETGTKGQVRNLILPILVLIIATVSMMIYTGAEALAADGKVFSVLGTFENTWG 312 

Query: 297 TSLVFGGTCGVL— AVVLCTFGTIKTADYPKAVWQGAKSMFGXXXXXXXXXXXSTWGEM 354 

TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 

Sbjct: 313 TSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAILFFAWTINKIVGDM 372 

Query: 355 HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALI 414 

TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 
Sbjct: 373 QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 432 

Query: 415 IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXXXXXXXXXXXXXXXX 474 

+PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
Sbjct: 433 LPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYAATVATATSIGYIW 4 92 

Query: 475 XXXKSALLGFGTTGIVLAVLIFLLKDK 501 

S L GF T + L V+IF +K + 
Sbjct; 493 GFTYSGLAGFAATAVSLIVIIFAVKKR 519 



Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 83 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 697>: 

1 . .AAGCAATGGT ATGCCGACGN .AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT. GTTTATCAGG ATGACAAGTT 

201 GGTCAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 698; ORF27>: 

1 ..KQWYADXSIK TEMVMVNDEP AKILTWDESG RLLSELSIRH HQRNGWLEW 
51 YEDGSKKSEX VYQDDKLVRK TQWDKDGYLI EP* 

Further work revealed the complete nucleotide sequence <SEQ ID 699>: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GTGGCGGGTA TTGCGCACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 
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201 ATATTCTGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 

4 01 TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 

4 51 GAAATCCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTCTC GGAACTGTCT 

601 ATCCGCCACC ATCAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

7 01 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 700; ORF27-l>: 

1 MKKLSRIVFS TVLLGFSAAL PAQTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 VAGIAHA QDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWTOWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHQRNGVV LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF27 shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) from strain A of A^: 
meningitidis: 

10 20 30 

orf27 .pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 

orf27a LSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVNDEPAKILTWDESG 
140 150 160 170 180 190 



orf27 .pep RLLSELSIRHHQRNGVVLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEPX 

orf27a RLLSELSIHHHXRNGVVLEWYEDGSKKXEAVYQDDKLVRKTQWDXDGYLIEPX 
200 210 220 230 240 

The complete length ORF27a nucleotide sequence <SEQ ID 701> is: 



1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGNTGTCT TCTGCCGCNT ATATCAGGCA ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

301 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

401 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AATATCAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GATTACTCTC GGAACTGTCT 

601 ATCCATCATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 702>: 

1 MKKLSRIVFS TVLLGFSAAL PAQXYSVYFN QNGKLTATXS SAAYIRQYSV 

51 AEGIAHA QXF XYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGX RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG NIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGW LEWYEDGSKK XEAVYQDDKL VRKTQWDXDG YLIEP* 

ORF27a and ORF27-1 show 94.7% identity in 245 aa overlap: 

10 20 30 40 50 60 

or f 27a. pep MKKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 
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XYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 
YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 



NGKKSAVMPYKNGLSEGTGXRYYRNGGKESEIQFKQNKRNGVWKQWYADGNIKTEMVMVN 
NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQHYADGSIKTEMVMVN 



DEPAKILTWDESGRLLSELSIHHHXRNGVVLEWYEDGSKKXEAVYQDDKLVRKTQWDXDG 
DEPAKILTWDESGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 



orf27a.pep YLIEPX 
orf27-l YLIEPX 

Homology with a predicted ORF from N.sonorrhoeae 

ORF27 shows 96.3% identity over 82 aa overlap with a predicted ORF (ORF27ng) from 
N. gonorrhoeae: 

orf27.pep KQWYADXSIKTEMVMVNDEPAKILTHDESG 30 

orf27ng LSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVNDEPAKILTWDESG 193 

orf27 .pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEP 82 
orf27ng RLLSELSIRHHKRNGVVLEWYEDGSKKSEAVYQDDKLVRKTQWDKDGYLIEP 245 



The complete length ORF27ng nucleotide sequence <SEQ ID 703> is: 



ATGAAGAAAT 
GGCCGCTTTG 
AACTGACGGC 
GCGGCGGGTA 
ATATTCCGAA 
CTACCCTGCA 
AAAATGGCGG 
CTGGTATCCG 
TGAGTGAGGG 
GAAATCCAGT 
TGCCGATGGA 
CCAAAATTCT 
ATCCGCCACC 
TTCTAAAAAG 
CCCAATGGGA 



TATCTCGGAT 
CCGGCGCAGA 
GACGATGTCT 
TCGCACACGC 
CCTTATATCG 
AAACGGTATG 
GGGGCTTCAG 
AACGGTAAAA 
TACGGGATAC 
TTAAGCAAAA 
AGTATCAAGA 
GACTTGGGAT 
ATAAACGCAA 
AGCGAGGCTG 
TAAGGATGGT 



TGTATTTTCA 
CCTATTCTGT 
TCTGCCGCTT 
GCAGGATTTT 
TTGCTTCAAC 
TTGATTTTGT 
CAAGGGTAAG 
AATCTGCGGT 
CGTTATTACC 
TAAGGCGAAC 
CGGAAATGGT 
GAAAGCGGCC 
CGGGGTGGTT 
TTTATCAGGA 
TATTTAATCG 



ATCGTACTGT 
TTATTTTAAT 
ATATCAGGCA 
TATTATCCGT 
GCAAATCAAA 
GGCATTTTAA 
CCGGACGGGG 
TATGCCTTAT 
GTAACGGCGG 
GGCGTATGGA 
TATGGTCAAC 
GATTACTTTC 
TTGGAGTGGT 
TGACAAGTTG 
AACCCTGA 



TGGGTTTTTC 
CAGAACGGGA 
ATATAGTGTG 
CGATGAAGAA 
TCTTTTGTGC 
TGGTCAGAAA 
AATGGGTCAA 
AAAAATGGCT 
CAAGGAAAGC 
AGCAATGGTA 
GATGAGCCTG 
GGAACTGTCT 
ATGAAGATGG 
GTCAGGAAAA 



This encodes a protein having amino acid sequence <SEQ ID 704>: 

1 MKKLSRIVFS IVLLGFSAAL PA QTYSVYFH QNGKLTATMS SAAYIRQYSV 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHKRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

ORF27ng and ORF27-1 show 98.8% identity in 245 aa overlap: 



MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 
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25 



MKKLSRIVFSIVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVAaGIAHAQDF 



YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 
YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 



NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEIWMVN 
NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 



DEPAKILTWDESGRLLSELSIRHHQRNGVVLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 
DEPAKILTWDESGRLLSELSIRHHKRNGVVLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 



orf27-l.pep YLIEPX 
orf27ng YLIEPX 



Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (24.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
17A shows the results of affinity pxmfication of the GST-fusion protein, and Figure 17B shows the 
results of expression of the His-fiasion in E.coli. Purified GST-fusion protein was used to immunise 
35 mice, whose sera were used for ELISA, which gave a positive result, confirming that ORF27-1 is 
a surface-exposed protein and a useful immunogen. 

Example 84 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 705>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

40 51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

45 301 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 

351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

401 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 

4 51 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGGGGATTGC 

501 AGTCGGGCTT GGTGATG 

50 This corresponds to the amino acid sequence <SEQ ED 706; ORF47>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 
51 MIWGYAGLW lAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 
101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 



wo 99/24578 



-404- 



PCT/IB98/01665 



151 HVQLHNGNLG GLLSGLQSGL VM 

Further work revealed the complete nucleotide sequence <SEQ ID 707>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAACT 

401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGCACGCA TGCGGCGTTC 

4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGCTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACTGCCAT 

651 GCTGATGGCG CACGGTGTGT TGGCTTGGCT GTCTGCCGTT TTTGCCTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAACCC 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTTGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This corresponds to the amino acid sequence <SEQ ID 708; ORF47-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLVV lAFLLTAV AT WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 WGA5AS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVLAW LSAV FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 

Homology with a predicted QRF from N. meningitidis (strain A) 

ORF47 shows 99.4% identity over a 172aa overlap with an ORF (ORF47a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf47 .pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHXLSGFYWHAHEM IWGYAGLW 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
orf47a MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 



lAFLLTAVA TWTGQPPTRGGV LVGLTIFWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 
lAFLLTAV ATWTGQPPTRGGV LVGLTIFWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 



MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVM 
I I I I I I I 1 I I I I I I I I I i M 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVMVSGFIGLI 



GTRII SFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVIFT 
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The complete length ORF47a nucleotide sequence <SEQ ID 709> is: 



ATGAAATTTA 
TTCACTGGCG 
GCTACACGGG 
ATGATTTGGG 
CGTCGCCACT 
GCTTGACTAT 
TGGGGTGCGT 
CGCGGTGTGC 
ATGTTGCCGT 
CACGTCCAGC 
GTCGGGCTTG 
TTATTTCGTT 
CCGAAATGGG 
GCTGATGGCG 
CGGCAGGTGT 
GTGTTGAAAG 
CGGATTGGGG 
TCAATCTGGG 
TTGGGCATGA 
TCCGCCGCCC 
CCGCCGTCCG 
AGCATACGCA 
GTGGAAGTAT 
GTTGA 



CCAAGCACCC 
GCTCTGTACG 
AACGCACGAG 
GTTATGCCGG 
TGGACGGGGC 
CTTTTGGCTG 
CGGCAAGCGG 
ATGGCTTTGC 
GTTCGCGCTG 
TGCACAACGG 
GTGATGGTGT 
TTTTACGTCC 
TGGCGCAGGC 
CACGGCGTGA 
GATTTTTACC 
AGCCGATGCT 
CTGATTGCGG 
TGTGCATCTG 
TGGCGCGTAC 
AAAGCCGTTC 
TATGGTTGCC 
CCTCTTCGGT 
ATTCCTTGGC 



CGTTTGGGCA 
GCGCATTGTC 
CTGTCCGGTT 
ACTGGTCGTC 
AGCCGCCCAC 
GCTGCGCGGA 
CATACTCGGT 
CCGTTATCCG 
TTCGTCTTGG 
CAACCTAGGC 
CGGGTTTTAT 
AAACGGTTGA 
TTCGCTGTGG 
TGCCTTGGCT 
GTGCAGGTGT 
GTGGATTCTG 
TCGGCGCGTC 
ATCGGGGTCG 
CGCGCTCGGT 
CCGTTGCGTT 
GTATTTTCTT 
TTTGTTTGCA 
TGATTCGTCC 



ATGGCGTTCC 
CGTATTGCTG 
TCTATTGGCA 
ATCGCCTTCC 
GCGGGGCGGC 
TTGCCGCCTT 
ACGCTGTTTT 
TTCGCAGAAT 
GCGGTACGCA 
GGACTCTTGA 
CGGTCTGATT 
ATGTGCCGCA 
CTGCCCATGC 
GTCGGCGGCT 
ACCGCTGGTG 
TTTGCCGGCT 
TTATTTCAAA 
GCGGTATCGG 
CATACGGGCA 
TTGGCTGATG 
CCGGCACTGC 
CTCGCGCTTT 
GCGTTCGGAC 



GCCCGTTTTA 
TGGGGTTTCG 
CGCGCATGAG 
TGCTGACCGC 
GTTCTGGTCG 
TATCCCGGGT 
TCTGGTACGG 
CAACGCAATT 
CGCGGCGTTC 
GCGGATTGCA 
GGTACGCGGA 
GATTCCCAGT 
TGACCGCCAT 
TTCGCGTTTG 
GTATAAGCCT 
ATCTGTTTAC 
CCCGCTTTCC 
CGTGCTGACT 
ATCCGATTTA 
ATGGCGGCAA 
CTACACGCAC 
TGGTGTATGC 
GGCAGGCCCG 



This encodes a protein having amino acid sequence <SEQ ID 710>: 



1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW lAFLLTAV AT WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 W6ASAS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

2 01 PKW VAQASLW LPMLTAMLMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

2 51 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVL T 

301 LG^4MARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47a and ORF47-1 show 99.2% identity in 384 aa overlap: 

10 20 30 40 50 60 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 
10 20 30 40 50 60 

70 80 90 100 110 120 

lAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

lAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
70 80 90 100 110 120 



oi:f47a.pep 
orf47-l 

orf47a.pep 
orf47-l 



MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 



GTRI I S FFTSKRLNVPQI PS PKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVI FT 
GTRI I S FFTSKRLNVPQI PS PECWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVI FT 



VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 



310 320 330 340 350 360 
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orf47a.pep LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

or f 47-1 LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
310 320 330 340 350 360 



LALLVYAWKYIPWLIRPRSDGRPGX 
LALLVYAWKYIPWLIRPRSDGRPGX 



Homology with a predicted ORF from N.gonorrhoeae 

ORF47 shows 97.1% identity over 172 aa overlap with a predicted ORF (ORF47ng) from 
N. gonorrhoeae: 

ORF47 ■ MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
ORF47ng MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 60 

ORF47 lAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 120 

ORF4 7ng lAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 120 

ORF4 7 MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVM 172 

ORF4 7ng MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVWGFIGLI IBO 

The ORF47ng nucleotide sequence <SEQ ID 71 1> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 712>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW lAFLLTAVA T WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAA5 GILG TLFFWYGAVC MAL PVIRSQN RRU YVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVWGFIGLI GMKII SFFTS KRLKLPQIPS 

201 PKWVAHASLW LPMLNAILMA HRVMPW LSAA FPFAAGVIFT VOV YAGGITP 

251 lEETSCGSVA GICYRLGNSS G 

The predicted leader peptide and transmembrane domains are identical (except for an He/ Ala 
substitution at residue 87 and an Leu/Ile substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri orf396, accession number e246540): 



>egments ir 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



ORF47ng 

Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



-1.91 
-1.44 
-1.38 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 713>: 



1 ATGAAATTTA CCAAACATCC CGTCTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCACTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG TCTCGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGAC AGCCGCCCAC GAGGGGCGGC GTTCTGGTCG 

251 GCTTGACCGC CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGG CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TtcgCAAAAC CGGCGCAACT 

401 ATGtcgCCGT ATTCGCAATA TTTGTGCTGG GCGGTACGCA TGCGgcgTTC 

451 CACGtccAgc tGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCCTG GTTATGGTGT CGGGCTTTAT CGGCCTGATT GGGATGAGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ACGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTACCCATGC TGACCGCCAT 
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651 ACTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

701 CGGCGGGCGT GATTTTTACC GTACAGGTGT ACCGCTGGTG GTATAAACCC 

751 GTATTGAAAG AACCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCTGCCTTCC 

851 TCAATCTGGG CGTACATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATTCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CGTCTTCGGT TTTGTTTGCA CTCGCGCTGC TGGTGTATGC 

1101 GTGGAAATAC ATTCCGTGGC TGATCCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 714; ORF47ng-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW lAFLLTAV AT WTGQPPTRGG V LVGLTAFWL AARIAAFI PG 

101 MGAAAS GILG TLFFWYGAVC MAL PVIRSQN RRN YVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAILMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLM IL FAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNSIYPPP KAVP VAFMLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47ng-l and ORF47-1 show 97.4% identity in 384 aa overlap: 

10 20 30 40 50 60 

orf 47-1 .pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

orf47ng-l MKFTKHPWAimFRPFYS 



orf 4 7-1. pep IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
orf47ng-l IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 



MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 



>rf4 7-l.pep GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVIFT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I : I I I I : I I I I I I I I I I 
irf4 7ng-l GMRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAILMAHGVMPWLSAAFAFAAGVIFT 



orf 47-1 . pep VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
orf47ng-l VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 



LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
LGMMARTALGHTGNSIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 



370 380 
orf 47-1 . pep LALLVYAWKYIPWLIRPRSDGRPGX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf47ng-l LALLVYAWKYIPWLIRPRSDGRPGX 
370 380 

Furthermore, ORF47ng-l shows significant homology to an ORF from Pseudomonas stutzeri: 

gnl I PID|e246540 (Z73914) ORF396 protein [Pseudomonas stutzeri] Length = 396 
Score = 155 bits (389), Expect = 5e-37 
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Identities = 121/391 (30%), Positives = 169/391 (42%), Gaps = 21/391 (5%) 

Query: 7 PVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFY WHflHEMIWGYAGLV 59 

P+W +AFRPF+ +LY L++ LW +TG GF WH HEM++G+A + 

Sbjct: 14 PIWRLAFRPFFLAGSLYALLAIPLWVAAWTGLWP— GFQPTGGWLAWHRHEMLFGFAMAI 71 

Query: 60 VIAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAV 119 

V FLLTAV TWTGQ G LVGL A WLAAR+ ++ G AA L LF 
Sbjct: 72 VAGFLLTAVQTWTGQTAPSGNRLVGLAAVWLAARL-GWLFGLPAAWLAPLDLLFLVALVW 130 

Query: 120 CMALPVIRSQNRRNYVAVFAIFVLGGTHAAFXXXXXXXXXXXXXXXXXXXXXMVSGFIGL 17 9 

MA + + +RNY V + ++ G +V+ + L 

Sbjct: 131 MMAQMLWAVRQKRNYPIVWLSLMLGADVLILTGLLQGNDALQRQGVLAGLWLVAALMAL 190 

Query: 180 IGMRIISFFTSKRLNVPQIPSP-KWVAQASLWLPMLTAILMAHGV MPWLSAAFAFA 234 

IG R+I FFT + L P W+ A L + A+L A GV PL FA 

Sbjct: 191 IGGRVIPFFTQRGLGKVDAVKPWVWLDVALLVGTGVIALLHAFGVAMRPQPLLGLLFV-A 249 

Query: 235 AGVIFTVQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYF-KPAFXXXXXXXXXX 293 

GV +++ RW+ K + K +LW L L+ + + +F A 
Sbjct: 250 IGVGHLLRLMRWYDKGIWKVGLLWSLHVAMLWLWAAFGLALWHFGLLAQSSPSLHALSV 309 

Query: 294 XXXXXXXXXMMARTALGHTGNSIYPPPKAVPVAFWLXXXXXXXXXXXXFSSGTAYTHSIR 353 

M+AR LGHTG + P+AFL FS + 

Sbjct: 310 GSMSGLILAMIARVTLGHTGRPLQLPAGIIG-AFVL FNLGTAARVFLSVAWPVGGLW 365 

Query: 354 TSSVLFALALLVYAWKYIPWLIRPRSDGRPG 384 

++V + LA +Y W+Y P L+ R DG PG 
Sbjct: 366 LAAVCWTLAFALYVWRYAPMLVAARVDGHPG 396 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 85 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 71 5>: 

1 . .ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 

51 AGCGCATGCC CMTGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 

101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGATGACACC AAGACCyAmG CTGCTGATrC 

201 TGTltGCTTTC GTGATAGGsA GGTTTGyTGG lonksAsyTTG TAyrATwkkG 

251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrwArTAG TCGTGGTTTy 

301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

351 CGCCGATATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

4 01 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

451 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

551 ATTCTCCAGC CGCCGAAATC . . 

This corresponds to the amino acid sequence <SEQ ID 716; ORF67>: 

1 ..MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPWVT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFVVVVY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI . . 

Computer analysis of this amino acid sequence gave the following resuUs: 
Homology with a predicted ORF from N.sonorrhoeae 

ORF67 shows 51.8% identity over 199 aa overlap with a predicted ORF (ORF67ng) from 
N.gonorrhoeae: 
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orf67.pep MPSEGSDGXGXGEXEXVAHAQXDFVGFEAG 30 

orf67ng TNFEIAVLSGMTVRVFYCARPAPVNGGRLKMPSEGSDGIGIGESEAVAHAQRGFVGFEAG 14 6 

90 100 110 120 130 140 

orf 67 .pep VFQASPWVTVSGVXXQLGXDVETDTGDDTKTXAADXVAFVIGRFXGXXLYXXAXXXXAX 90 

orf 67ng VFQASPWVAVAGVQGQAGRDVYAHARHRAEAQAAAAVAFLIGVFLRMSVRINRNCCVSI 206 

orf 67 .pep XWXXXXSRGFXXHRMNLMFNVSVGDARADIGFEFIVEFEIVNGGQAERRNGVEAAVSLMF 150 

or f 6 7ng TRVGGKSTCYFFSRI DAVS DVSVGDARTDIGFEFVVE FE I VNGGQAERRNGVECAVFLMF 266 

orf 67. pep CLGFFVV WYLFSNFFSRRITFF-PFSVTGIICRYSPAAEI 190 

orf67ng RLLVFYVKLVAAKSFIILSFQLFYVHGIFIWPFPVTGIIRGDAPAAEWADRHPGVDGM 326 

The ORF67ng nucleotide sequence <SEQ ID 71 7> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 718>: 

1 MPSETVGSIV NVGVDESVGF SPPFPSIQHF YRFHRIHRIR LFRPPGPMQL 
51 NRHSHGSGNL GRGVWATVLS DKFPCGQVRI PACAGMTNFE lAVLSGMTVR 
101 VFYCARPAPV NGGRLKMPSE GSDGIGIGES EAVAHAQRGF VGFEAGVFQA 
151 SPVWAVAGV QGQAGRDVYA HARHRAEAQ A AAAVAFLIGV FLRMSV RINR 
201 NCCVSITRVG GKSTCYFFSR IDAVSDVSVG DARTDIGFEF WEFEIVNGG 
251 QAERRNGVE C AVFLMFRLLV FYVKLV AAKS F IILSFQLFY VHGIFIW PF 
301 PVTGIIRGDA PAAEWADRH PGVDGMRTDV SEIIAYRAYF VFAWSGWFRI 
351 IVGNAFGGVG * 

Based on the presence of a several putative transmembrane domains in the gonococcal protein, it 
is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 86 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 719> 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

251 CTATTGCGsG CATCATGACG CCGrAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCA. . . 

This corresponds to the amino acid sequence <SEQ ID 720; ORF78>: 

1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT PXRYEQVQEK 

101 F DKYGMWVLF VARFLPGL RT AVFVTAGISR KVSYLRFIIM DGLAA. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 721>: 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

251 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 

4 51 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 

551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 

601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 

651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 

This corresponds to the amino acid sequence <SEQ ID 722; ORF78-l>: 

1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FIIM DGLAALISVP 

151 IWIYLGEYGA HNIDWLMAKM HSLQ SGIFVI LGIGATVVAW I WWKKRQRIQ 

2 01 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 

Computer analysis of this amino acid sequence predicts several transmembrane domains, and also 
gave the following results: 

Homology with the dedA homologue of H. influenzae (accession number P4528Q) 
ORF78 and the dedA homologue show 58% aa identity in 144aa overlap: 

Orf78: 4 FLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM—GYTNPHIMFAVGMLGV 61 

FL FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GV 

DedA: 20 FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 79 

Orf78: 62 LVGDGIMFAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNHVLFVARFLPGLRTA 121 

L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 
DedA: 80 LAGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAP 139 

Orf78: 122 VFVTAGISRKVSYLRFIIMDGLAA 145 

+++ +GI+R+VSY+RF+++D AA 
DedA: 140 lYMVSGITRRVSYVRFVLIDFCAA 163 



Homology with a predicted ORF from N. meningitidis Cstrain A) 

ORF78 shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) from strain A of A': 
meningitidis: 

10 20 30 40 50 60 

orf 7 8 . pep MFAFLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 

I 1 I : I 1 I I I I I I I I I i ! I I I I I I I I I I I I I I I I I I I 1 I I I 1 I I I I I i I I 1 I I I I I I I I M 
orf 78a MFALLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 
10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 8 . pep VLVGDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNW VLFVARFLPGLRT 
i 1 I I I I I I I I I I I I I I I I I III I I II II M I I II II II I I I I I II I II I II I I 
or f 7 8a VLVGDGIM FAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNW VLFVARFLPGLRT 
70 80 90 100 110 120 



130 140 
AVFV TAGISRKVSYLR FIIMDGLAA 

AVFV TAGISRKVSYLR FLIMDGLAALISVPVWI YLGEYGAHNIDWLMAKMHSLQ SGIFIA 
130 140 150 160 170 180 

The complete length ORF78a nucleotide sequence <SEQ ID 723> is: 

1 ATGTTTGCCC TTTTGGAAGC CTTTTTTGTC GAATACGGCT ATGCGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCACAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTGTTATTT GTCGCTCGTT TCCTGCCCGG 

351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

4 01 ATCTGCGCTT TCTGATTATG GACGGGCTTG CCGCGCTGAT TTCCGTGCCC 

4 51 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 



orf78 .pep 
orf78a 
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501 GGCGAAAATG CACAGCCTGC AATCCGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAA 

This encodes a protein having amino acid sequence <SEQ JD 724>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78a and ORF78-1 show 89.0% identity in 227 aa overlap: 

10 20 30 40 50 60 

MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

I M : I ! 1 I I I ] I I I I I 1 I I I i I I I I I I I ! I I I I I I I I I i i I I I I I I I I I I I I I I I I I I I I 
MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
10 20 30 40 50 60 



orf78a.pep 
orf78-l 



70 80 90 100 110 120 

orf78a.pep VLVGDGIMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 

I I I I I I I I I I I I I I ! I I I i I : I i I I ! I I I I I I I 1 I I I I i I I I I I I I I I I I I I I I I I I I I 
orf78-l VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 
70 80 90 100 110 120 



AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 
I I 1 I I I I I I I I I I I I I I : I I I I I I I I I i ! I : I 1 I I I I i I I I I I I I I I I I I I I I I I I I I : 
AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 



or f 7 8a . pep LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 
orf7 8-l LGIGATVVAWIWWKKRQRIQFYRSKLKEKRAQRKAAKAAKKAAQSKQX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF78 shows 97.4% identity over 38 aa overlap with a predicted ORF (ORF78ng) from N. 
gonorrhoeae: 

orf 78 .pep XXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAGISRKVSYLRF 137 

orf78ng YPVLFVARFLPGLRTAVFVTAGISRKVSYLRF 32 

orf78.pep IIMDGLAA 145 

orf7 8ng LIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALGVLAAALAWFWWRKRR 92 

The ORF78ng nucleotide sequence <SEQ ID 725> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 726>: 

1 . .YP VLFVARFL PGLRTAVFV T AGISRKVSYL R FLIMDGLAA LISVPVWI YL 
51 GEYGAHNIDW LMAKMHSLQ S GIFIALGVLA AALAWF WWRK RRHYQLYRAQ 
101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 727>: 



atgtttgccc tttTggaagc CTTTTTTGTC GAAtacggCt atgcGGCCGT 
GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAAGATT 
TGACCTTGGT AACGGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 
CATATTATGT TTGCGGTCGG TATGCTCGGC GTGTTGGCGG GCGACGGCGT 
GATGTTTGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 
CGATTGCGCG CATCATGACG CCGAAACGTT ACGCGCAGGT TCAGGAAAAA 
TTCGACAAAT ACGGCAACTG GGTTCTGTTT GTCGCCCGTT TCCTGCCGGG 
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351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

4 01 ATCTGCGCTT TCTGATTATG GACGGGCTGG CCGCGCTGAT TTCCGTGCCC 

4 51 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAa 

This corresponds to the amino acid sequence <SEQ ID 728; ORF78ng-l>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLAGDGVM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORP78ng-l and ORF78-1 show 88.1% identity in 227 aa overlap: 

10 20 30 40 50 60 

MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
I I I : I I I I 1 I I I I I I M I M I I I I I I I I I I I I I I I I I I i I i I I I I I I I I ! I I I I I I I I I I 
MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
10 20 30 40 50 60 

70 80 90 100 110 120 

VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

VLAGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 
70 80 90 100 110 120 



orf 78-1. pep 
orf78ng-l 

orf 78-1. pep 
orf78ng-l 



130 140 150 160 170 180 

orf 78-1. pep AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

I ! I I I I I I I I I I I I I I I : I I I I I I I I I I I I : I I I I I I I I I I I I I I I 1 I I I I I I I I I I I : 
orf78ng-l AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

130 140 150 160 170 180 



orf 7 8-1 .pep 
orf78ng-l 



LGIGATWAWIWWKKRQRIQFYRSKLKEKRAQRKAAKAAKKAAQSKQX 



Furthermore, orf78ng-l shows homology to the dedA protein from H. influenzae: 

sp|P45280iYG29_HAEIN HYPOTHETICAL PROTEIN HI1629 >gi 1 1073983 | pir | | D64133 dedA 
protein (dedA) homolog - Haemophilus influenzae (strain Rd KW20) 
>gi 11574476 (U32836) dedA protein (dedA) [Haemophilus influenzae] Length = 212 
Score = 223 bits (563), Expect = 7e-58 

Identities = 108/182 (59%), Positives = 140/182 (76%), Gaps = 2/182 (1%) 



Query: 5 

Sbjct: 21 

Query: 63 

Sbjct: 81 

Sbjct: 



LEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM— GYTNPHIMFAVGMLGVL 62 
L FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GVL 

21 LIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 80 

63 AGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 122 

AGD M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 
81 AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 140 

123 FVTAGISRKVSYLRFLIMDGLAALISVPWIYLGEYGAHNIDWLMAKMHSLQSGIFIALG 182 

++ +GI+R+VSY+RF+ ++D AA+ISVP+WIYLGE GA N+DWL ++ Q I + I +G 
141 YMVSGITRRVSYVRFVLIDFCAAIISVPIWIYLGELGAKNLDWLHTQIQKGQIVIYIFIG 200 



183 VL 184 
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Based on this analysis, including the presence of putative transmembrane domains, it is predicted 
that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 

Example 87 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 729>: 

1 ATGAARAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C... 

This corresponds to the amino acid sequence <SEQ ID 730; ORF79>: 



1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG 1 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH. . 

Further work revealed the complete nucleotide sequence <SEQ ID 73 1>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

2 01 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGTCATCAC 

4 51 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 732; ORF79-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 
151 HGEAHQH* 

Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 

Homology with a predicted ORE from N.meninsitidis (strain A) 

ORF79 shows 94.6% identity over a 147aa overlap with an ORE (ORF79a) from sfrain A oiN. 
meningitidis: 

10 20 30 40 50 60 

or f 7 9 . pep MKKLLAAVMMAGLAGA VSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

or f 7 9a MKXLLAAVMMAGLAGA VSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 



PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
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orf 7 9 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNH 

or f 7 9a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 
130 140 150 

The complete length ORF79a nucleotide sequence <SEQ ID 733> is: 

1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAATCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATGGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CATATCAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

4 01 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence <SEQ ID 734>: 

1 MKXLLAAVMM AGLAGA VSAA GIHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 
151 HGEAHQH* 

ORF79a and ORF79-1 show 94.9% identity in 157 aa overlap: 

10 20 30 40 50 60 

orf 7 9a . pep MKXLLAAVMMAGLAGAVSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 

orf 7 9-1 MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 



orf 7 9a pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 

I I I I I I I I I I I I I I I ! I M I I I 1 ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 7 9-1 PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
70 80 90 100 110 120 

130 140 150 

orf 7 9a . pep VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 

I I I I I I I I I I I I I I I I I i III II : I I I II I II I I I I 
orf 7 9-1 VTLKFfCNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 
130 140 150 

Homology with a predicted ORF from N.sonorrhoeae 

ORF79 shows 96.1% identity over 76 aa overlap with a predicted ORF (ORF79ng) from 
N. gonorrhoeae: 

orf 7 9 .pep FMKIHNDEAKQDFLLGGSSPVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS 101 
orf7 9ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 30 

orf 7 9. pep YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKIAPMPAMNH 14 7 

orf79ng YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQH 86 

An ORF79ng nucleotide sequence <SEQ ID 73 5> was predicted to encode a protein comprising 
amino acid sequence <SEQ ID 736>: 

1 . . INDNGVMRMR EVKGGVI 
51 TLKFKNAKAQ TVQLEVI 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 737>: 
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1 ATGAAAAAAT TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTccgccgCc GGagTccAtG TCGAggACGG CTGGGCGCGc accaCTGtcg 

101 aaggtATgaa aatggGCGGC GCgttCATga aaATCCACAA CGACGaaGcc 

151 atacaaGACt ttgtgcTCgg CGGaagcatg cccgttgccg accgcGTCGA 

201 AGTGCAtaca cacATCAACG ACAACGGCGT GATGCGTATG CGCGAAGTCA 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAARCCCGGC 

301 AGCTATCACG TGATGTTTAT GGGTTTGAAA AAACAACTGA AAGAGGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGAACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 738; ORF79ng-l>: 

1 MKKLLAAVMM AGLAGA V3AA GVHVEDGWAR TTVEGMKMGG AFMKIHNDEA 

51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 

101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGHH 

151 HGEAHQH* 

ORF79ng-l and ORF79-1 show 95.5% identity in 157 aa overlap: 



MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 
MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSM 



orf7 9-l.pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
orf7 9ng-l PVADRVEVHTHINDNGVMRMREVKGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 



orf7 9-l.pep VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 
orf 7 9ng-l VTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 



130 



140 



150 



Furthermore, ORF79ng-l shows significant homology to a protein from Aquifex aeolicus: 

is] Length = 151 
= 1/114 (0%) 



gi 1 2983695 (AE000731) putative protein [Aquifex e 
Score =63.6 bits (152), Expect = 6e-10 
Identities = 38/114 (33%), Positives = 58/114 (50%), Gaps = 



Sbjc 



24 VEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEVHTHINDNGVMRMREV E 

V+ W G M I N+ D+++G +A RVE+H + +N V +M 

27 VKHPWVMEPPPGPNTTMMGMIIVNEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ f 



KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 
+ + + K E K YHVM +GLKK++KEGDK+ V L F+ + TV+ V 
ER-IEIPPKGKVEFKHHGYHVMIIGLKKRIKEGDKVKVELIFEKSGKITVEAPV 139 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useftil antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (15.6kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
50 products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 8A shows 
the results of affinity purification of the His-fiision protein. Purified His-fiision protein was used 
to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 
18B) These experiments confirm that ORF79-1 is a surface-exposed protein, and that it is a usefiil 
immunogen. 
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Example 88 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
739>: 

1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AsCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 740; ORF98>: 

1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWWSYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

201 YVISLGMVIP DDLPVKTLAX PMPSEKADLP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 741>: 

1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 742; ORF98-l>: 

1 MTEXAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORE from N. meningitidis (strain Al 

ORF98 shows 96.1% identity over a 233aa overlap with an ORE (ORF98a) from sfrain A oiN. 
meningitidis: 

10 20 30 40 50 60 

orf 98 .pep MTVTAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

orf98a MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
10 20 30 40 50 60 
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GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 
GFNIPGLGVIVAIAVLFVTGLFftANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 



orf 98 . pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 
orf98a SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 



IMVKKSDVRELDMSVDEXLKYVISLGMVIPDDLPVKTLAXPMPSEKADLPEQQX 
IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 



The complete length ORF98a nucleotide sequence <SEQ ID 743> is: 



ATGACGGAAC 
ATATCTGATT 
GGGTGGTTTC 
CCGAAGCAAT 
GGGCGTTATC 
CAAACGTATT 
CGGATTCCGG 
NTCGTTGCTG 
CGTTTCCCCA 
TCGAATGCGG 
GTATGTTCCG 
AGAAAAGCGA 
TATGTGATTT 
ATTGGCAGGA 



CTGCGGCCGA 
ACGGGCATTT 
CTATATCGTT 
GGCGGCCGCA 
GTTGCCATTG 
GGGCCGGCAG 
TTGTGAAGTC 
TCCGACAGCA 
ATCGGGTATT 
TTAAGGCCGC 
ACCACGCCGA 
TGTGCGCGAA 
CGCTGGGTAT 
CCTATGCCGT 



AGGCGGCAAA 
TGGTCTGGCT 
TCCGCGTCCG 
ATATGTTTTG 
CCGTATTGTT 
ATTCTTGCCG 
CATCTATTCG 
GCCGTTCGTT 
TGGACAATCG 
ATTGCCGAAG 
ATCCGACCGG 
CTCGATATGA 
GGTCATCCCT 
CTGAAAAGGC 



GCTGCCAAGG 
GCCGATTGCG 
ATCAGCTCGT 
GGGTTTAATA 
TGTAACCGGA 
CGTGGGACAG 
AGTGTGAAAA 
TAAAACACCA 
CATTCGTGTC 
GACGGCGATT 
CGGTTACTAT 
GCGTGGACGA 
GACGACCTGC 
GGATTTGCCC 



CGTTAAAAAA 
GTAACGGTTT 
CAACCTGCTG 
TCCCGGGGCT 
TTATTTGCCG 
CTTGTTGGGG 
AAGTATCCGA 
GTACTCGTGC 
CGGTCAGGTG 
ATCTTTCCGT 
ATTATGGTAA 
AGCGTTGAAA 
CCGTCAAAAC 
GAACAACAAT 



This encodes a protein having amino acid sequence <SEQ ID 744>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTWVV SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSXSLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

ORF98a and ORF98-1 show 98.7% identity in 233 aa overlap: 



MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPQYVL 



GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I ! I I III 

GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 



)r f 98a . pep SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
II I II I II II I I I I I I I I I I I I I I I II I I I II I II I I II I I II I I I II I I I I II I I I I I 
>rf 98-1 SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 



orf 98a. pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

I I II I II M II I II I I I I I I II 1 II II I II I I I I I I I II I I II I I I I I I I II II 

orf 98-1 IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF98 shows 95.3% identity over a 233 aa overlap with a predicted ORF (ORF98ng) from 
N. gonorrhoeae: 



10 



20 



50 



60 



MTVTAAEGGKAAKALKKYL I TG I LVWL P I AVT VW WS YI VS AS DQLVNLLPKQWRPQYVL 6 0 

MTEPAAEGGKRAKALKKYLITGILVWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPQYVL 60 

GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 120 

GraiPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLXRIPVVKSIY^ 120 

SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVFCAALPXDGDYLSVYVPTTPNPTGGYY 180 

SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 180 
I^WKKS DVRELDMSVDEXLKYVI SLGMVI PDDLPVKTLAXPMPSEKADLPEQQ 233 
IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQ 233 

The complete length ORF98ng nucleotide sequence <SEQ ID 745> is predicted to encode a protein 
having amino acid sequence <SEQ ID 746>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLX 

101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 747>: 



orf 98 .pep 
orf98ng 
orf 98 .pep 
orf98ng 
orf 98 .pep 
orf98ng 
orf 98 .pep 
orf 98ng 



ATGACGGAAC 
ATATCTGATT 
GGGTGGTTTC 
CCGAAGCAAT 
CGGCGTTATT 
CAAACGTGTT 
cggaTTCCGG 
ATCGCTGCTG 
CGTTTCCCCA 
TCGAATGCGG 
GTATGTCCCG 
AGAAAAGCGA 
TATGTGATTT 
ATTGGCAGGA 



CTGCGGCCGA 
ACAGGCATTT 
CTATATCGTT 
GGCGGCCGCA 
GTTGCCATTG 
GGGCCGGCAG 
TTGTCAAATC 
TCCGACAGCA 
ATCGGGTATT 
TTAAGGCCGC 
ACCACGCCCA 
TGTGCGCGAA 
CGCTGGGTAT 
CCTATGCCGC 



AGGCGGCAAA 
TGGTCTGGCT 
TCCGCGTCCG 
ATATGTTTTG 
CCGTATTGTT 
ATTCTTGCCG 
CATCTATTCG 
GCCGTTCGTT 
TGGACAATCG 
ATTGCCGCAG 
ACCCGACCGG 
CTCGATATGA 
GGTCATCCCT 
CTGAAAAGGC 



GCTGCCAAGG 
GCCGATTGCG 
ACCAGCTTGT 
GGGTTTAATA 
TGTAACCGGA 
CGTGGGACAG 
AGTGTGAAAA 
TAAAACGCCG 
CATTCGTGTC 
GATGGCGATT 
CGGTTACTAT 
GCGTGGACGA 
GACGACCTGC 
GGAGTTGCCC 



CGTTAAAAAA 
GTAACGGTTT 
CAACCTGCTG 
TCCCCGGGCT 
TTATTTGCCG 
CCTGTTgggg 
AAGTATCCGA 
GTACTCGTGC 
CGGTCAGGTG 
ATCTTTCCGT 
ATTATGGTAA 
AGCGTTGAAA 
CCGTCAAAAC 
GAACAACAAT 



This corresponds to the amino acid sequence <SEQ ID 748; ORF98ng-l>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

ORF98ng-l and ORF98-1 show 97.9% identity in 233 aa overlap: 



1-1 .pep MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
III I II I I II I I I I I I II II I II I I I II I I II I II II I M I I I II II I II II I I II I II 
ing-1 MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 



70 80 90 100 110 120 

orf 98-1. pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
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orf98ng-l GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWD3LLGRIPWKSIYSSVKKVSESLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98-1. pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNRVKAALPKDGDYLSVYVPTTPNPTGGYY 

orf98ng-l SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 
130 140 150 160 170 180 

190 200 210 220 230 

orf 98-1 .pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

orf98ng-l IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQX 
190 200 210 220 230 

Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 89 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 749>: 

1 ATgAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG G^GgTACTCA 

201 ATATCCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

251 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

301 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 

351 GaSAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

4 01 AGATGGAAAA CATCGAssTG CGCGACCGTT ATCTTGCGGA AATCGCCAAA 

451 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 

501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 

551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA . ATTCGTTAC 

601 GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 

651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

7 01 GGGCATATCC GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 

751 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 

801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

851 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 

901 AGCCTTTGTC GAARGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 

951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAATCGGAGA ACCGCAGAAG 

1151 GCGGAGGCGC AC. . . 

This corresponds to the amino acid sequence <SEQ ID 750; ORF100>: 

1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGVLNIPE KMQRFGSARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR TLALMLXAHA AGQMENIXXR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADA7\ALKT 

251 CLKRIPDSLK NGELSV3VAE KYERLGLYAD AVKWVKQHYP XNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AFGRKLWGKA 

351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH... 

Further work revealed the complete nucleotide sequence <SEQ ID 75 1>: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 
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1001 
1051 
1101 
1151 
1201 



TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAACAT 
CCGGAAAAAC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGATGTATCT 
AAAGGCTACC 
TTTGGTTCTA 
AGGCGCAGCG 
GCAGCGTTAG 



AAGATGCAGC 
CTTGAACAAG 
AACTAGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTTGGGCA 
CCAGCTGGCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
AGCAGCATAG 



GTTTCGGTTC 
GCGGGTTTGG 
CTCACGCGTG 
TGATGCTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAG 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TCGACGAAAT 
TTGGAAGCCG 
CTGA 



GGCGCGTAAA 
CGTATTTTGA 
TTGGTCAACA 
CGCGCACGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GCCCCGAGCT 
CGCGAACAGC 
GCCCGATAAC 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TCTCCGATGA 



GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCCGGACAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAACTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCC 
AGAAAGCCAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
CGAACGTCAC 



This corresponds to the amino acid sequence <SEQ ID 752; ORF100-1>: 

1 MKTWWIVVL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 
51 AVWWYFLFK FIIGV LMIPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 
101 EKAELEASRV LVNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 
151 PEKQQLSRYL LLAE3ALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 
201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 
251 CLKRIPD3LK NGELSV3VAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 
301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 
351 KGYLEASIAL KPSISARLVL AKVFDEIGEP QKAEAQRNLV LEAVSDDERH 
401 AALEQHS* 

Computer analysis of this amino acid sequence gave the following resuhs: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORFIOO shows 93.5% identity over a 386aa overlap with an ORF (ORFlOOa) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 100 . pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 

I I 1 I ! I I I I I I I ! I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 100a MKTWWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 



orf 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 
orf 100a FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 



orf 100 . pep TLALMLXAHAAGQMENIXXRDRYLAEIAKLEEKQQLSRYLLLAESALNRRDYEAAEANLH 
orf 100a TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 



orf 100. pep AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
orf 100a AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 



.rf 100 . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 
I I I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
irflOOa DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
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310 320 330 340 350 360 

orf 100 . pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 

O r f 1 0 0 a FVE S VRFLGERDQQKAI DFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKG YLEAS lAL 

310 320 330 340 350 360 



370 380 
KPSISARLVLTKVFDEIGEPQKAEAH 
I I I I I I I I I I : I I I I I I I I I I I I I : 

KPSISARLVLAKVFDETGEPQICAEAQRNLVLA3VAEENRP3AETHX 
370 380 390 400 

The complete length ORFlOOa nucleotide sequence <SEQ ID 753> is: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CNNTCGGGCT 

51 GGCATTGGCG TCGGGCATTN ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CCTGTTCAAA TTCATCATCG GCGTACTCAA 

201 TANCCCCGAA AAGATGCAGC GTTTCGGTTC GGCGCGTAAA GGCCGCAAGG 

251 CCGCGCTTGC TTTGAACAAG GCGGGTTTGG CGTATTTTGA AGGGCGTTTT 

301 GAAAAGGCGG AACTTGAAGC CTCGCGCGTA TTGGGAAACA AAGAGGCGGG 

351 GGATAACCGG ACTTTGGCAT TGATGTTGGG CGCACATGCC GCCGGGCAGA 

4 01 TGGAAAACAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

451 CCGGAAAAGC AGCAGCTTTC CCGTTATCTT TTGTTGGCGG AATCGGCGTT 

501 GAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCT 

601 TTCGACAGGG GCGACGCGTT GCAGGTTCTG GCAAAAACCG AAAAANTTTC 

651 CAAGGCGGGC GCGTNGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

701 CATACCGCCG CCAGCTGNCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGAGCGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

851 GGGTCAAACA GCATTATCCG CACAACCGCC GACCCGAACT TTTGGAAGCN 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAA CGCGATCAGC AGAAAGCCAT 

951 CGATTTTGCC GATGCTTGGC TGAAAGAACA GCCCGATAAT GCGCTTCTGC 

1001 TGANGTATCT CGGTCGGCTC GCCTACGGCC GCAAACTTTG GGGCAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG CATTGCATTA AAGCCGAGTA TTTCCGCGCG 

1101 TTTGGTTCTG GCAAAGGTTT TTGACGAAAC CGGAGAACCG CAGAAGGCGG 

1151 AGGCGCAGCG CAACTTGGTT TTGGCAAGCG TTGCCGAGGA AAACCGNCCT 

1201 TCCGCCGAAA CCCATTGA 

This encodes a protein having amino acid sequence <SEQ ID 754>: 



orflOO.pep 
orflOOa 



MKTWWIWL FAAAXGLALA SGIXTGDVYI VLGQTMLRIN LHAFVLGSLI 
AVWWYFLFK FIIGV LMXPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 
EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 
PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 
FDRGDALQVL AKTEKXSKAG AXGKSEMERY QNWAYRRQLX DAADAAALKT 
CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 
FVESVRFLGE RDQQKAIDFA DAWLKEQPDN ALLLXYLGRL AYGRKLWGKA 
KGYLEASIAL KPSISARLVL AKVFDETGEP QKAEAQRNLV LASVAEENRP 
SAETH* 



ORFlOOa and ORFlOO-1 show 95.1% identity in 405 aa overlap: 



MKTWWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
MKTVVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 



FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 



TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
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AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 
AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 



orf 100a . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
orf 100-1 DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 



orf 100a . pep FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 
orf 100-1 FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 



KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSA-ETHX 
KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 



Homology with a predicted ORF from N. gonorrhoeae 

ORFIOO shows 93.3% identity over a 386 aa overlap with a predicted ORF (ORFlOOng) from 
N.gonorrhoeae: 

orf 100 .pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 60 

orflOOng MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 60 

orf 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 120 

orflOOng FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEA3RVLGNKEAGDNR 120 

orf 100 . pep TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

orflOOng TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

orf 100 . pep AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 24 0 

I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I : I 
orflOOng AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 240 

orf 100 . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 300 

orflOOng DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 300 

orf 100 .pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 360 

I I I ! I I ! I ! I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 
orflOOng FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 360 

orf 100. pep KPSISARLVLTKVFDEIGEPQKAEAH 38 6 

ill! 11111:11111 : : Mill: 
orflOOng KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETR 405 

The complete length ORFlOOng nucleotide sequence <SEQ ID 755> is: 



ATGAAAACGG 
GGCGCTGGCT 
AGACCATGCT 
GCCGTCGTGG 
TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAATAT 



TAGTCTGGAT 
TCGGGCATTT 
CAGAATCAAC 
TGTGGTATTT 
AATATGCGGC 
CTTGAATAAG 
AACTCGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 



TGTTGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTTAAA 
GTTCCGGTTC 
GCGGGTTTGG 
CTCTCGAGTG 
TGATGCTGGG 
GACCGTTATC 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGGAAA 
CGTATTTCGA 
TTGGGCAACA 
CGCGCACGCG 
TTGCGGAAAT 



CCGTCGGACT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCAGGACAGA 
CGCCAAACTG 
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CCGGAAAAAC 
AAACCGGCGC 
AGATGAATGC 
TTCGATCGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGATGTATCT 
AAAGGCTACC 
TTTGGTGTTG 
AAGCACAGCG 
TCCGCCGAAA 



AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGATGCGTT 
GCGTTGGGCA 
CCAGATGGCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATTCTTGGC 
CGGCCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
CCCGTTGA 



CCGCTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAG 
TGAAAGAACA 
GCCTACGGCC 
TATTGCACTG 
TTGACGAAAC 
TTGGCAAGCG 



CTGCTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAaccG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GCCCCGAGCT 
CGCGAACAGC 
GCCCGATAAC 
GCAAACTTTG 
AAGCCGAGTA 
CGCACAGTCG 
TTGCCGGGGA 



AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCC 
AAAAACTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGagcGTATC 
GCGGTCAAAT 
TTTGGAAGCC 
AGAAAGCCAT 
GCGCTTCTGC 
GGGTAAGGCA 
TTCCGGCGCG 
CAAAAAGCCG 
AAACCGCCCT 



This encodes a protein having amino acid sequence <SEQ ID 756>: 



101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQMA 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP 

301 FVESVRFLGE REQQKAIDFA DSWLKEQPDN ALLLMYLGRL 

351 KGYLEASIAL KPSIPARLVL AKVFDETAQS QKAEAQRNLV 

401 SAETR* 

ORFlOOng and ORFlOO-1 show 95.3% identity in 402 aa overlap: 



LHAFVLGSLI 
AGLAYFEGRF 
DRYLAEIAKL 
RLVRLQLRYA 
DAADAAALKT 
HNRRPELLEA 
AYGRKLWGKA 
LASVAGENRP 



orf 100-1 . pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 
orflOOng MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 



orflOO-1 .pep 
orflOOng 



FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 
FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 



orflOO-l.pep 



TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 



:rflOO-l.pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
irflOOng AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 



orf 100-1. pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
orflOOng DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 



irf 100-1. pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
>rflOOng FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 



orf 100-1 .pep KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 
orflOOn KPSIPARLVLAKVFDETAQSQKTiEAQRNLVLASVAGENRPSAETRX 
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370 380 390 400 

Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from N. meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 90 

The following DNA sequence, beUeved to be complete, was identified in N.meningitidis <SEQ ID 
757> 



1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATsTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 758; ORF102>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAVVFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYXWFK PF* 

Further work revealed the complete nucleotide sequence <SEQ ID 759>: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

2 01 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 760; ORF102-1>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAM IDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAVVFGAA IPFAAG WWGS GWVHVK LCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with HP1484 hypothetical integral membrane protein of//, pylori (accession number AE000647) 
ORF102 and HP1484 show 33% aa identity in 143aa overlap: 

orfl02 3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 

HP1484 8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK--KLYSFIASPAM 65 

orfl02 63 GAWFGAAIPFAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 119 

G + + + GW+H KL L ++LLAY YC +R + + R+Y 

HP1484 66 GFTLITGILMLLIEPTLFKSGGWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRNARFY 125 

orfl02 120 RVFNEIPXXXXXXXXXXXXFKPF 142 

RVFNE P KPF 
HP1484 126 RVFNEAPTILMILIVILVWKPF 148 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF102 shows 99.3% identity over a 142aa overlap with an ORF (ORF 102a) from strain A of A^. 



orf 102 . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I 1 
orf 102a MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 



orf 102 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
O r f 1 0 2 a GFGAWFGAAI PFAAGWWGSGWVHVKLCLGLMLLAYQL YCGVLLRRFQDYSNAFSHRWYR 



orf 102. pep VFNEIPVLLMVAALYXWFKPFX 

orfl02a VFNEIPVLLMVAALYLWFKPFX 
130 140 

The complete length ORF 102a nucleotide sequence <SEQ ID 761 > is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 762>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVM MAMIDVPRGH PEYVRLSGMA 



ORF 102a and ORF 102-1 show complete identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102a . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

orf 102-1 MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102a . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

orfl02-l G FGAWFGAAI PFAAGWWG SGWVHVKLCLGLMLLAYQLYCGVLLRRFQD YSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orfl02a.pep VFNEIPVLLMVAALYLWFKPFX 

orfl02-l VFNEIPVLLMVAALYLWFKPFX 
130 140 

Homology with a predicted ORF from N.sonorrhoeae 

ORF102 shows 97.9% identity over a 142 aa overlap with a predicted ORF (ORF102ng) fromA^. 
gonorrhoeae: 
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orfl02.pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 60 

orfl02ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 60 

orf 102 .pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

orfl02ng GFGAVVFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

orf 102. pep VFNEIPVLLMVAALYXWFKPF 142 

orfl02ng VFNEIPVLLMVAALYLWFKPF 142 

The complete length ORF102ng nucleotide sequence <SEQ ID 763> is: 



ATGATGTTTT CTTGGTTCAA 
GTTTGCAGGG CTGTTTTACC 
TTGATGCGCC GCGCGGCAAT 
GTGCGGTTGT ACCGTTTTAT 
CGGCGCGGCG ATACCGTTTG 
ACGTCAAACT GTGTTTGGGC 
GGCGTGCTGC TGCGCCGTTT 
CTGGTACCGC GTGTTCAAcg 
TGTATCTGGT CGTGTTCAAA 



GCTGTTTCAC 
TGCCGAGGAT 
CCCGAGTATG 
GTCGCCTTTG 
CCGCcggccg 
TTGATGCTCT 
TCAGGATTAC 
aAATCCCCGT 
CCGTTTTGA 



TTGTTTTTTG 
TTTCGTCAAT 
TGCGCCTGTC 
GGTTTCGGCG 
GTGGGGCagc 
TGGCTTATCA 
AGCAATGCTT 
GCTGCTGATG 



TCATTTCGTG 
ATGGCGATGA 
GGGGATGGCG 
CGGTCGTGTT 
ggctggGTTC 
GTTGTATTGC 
TTTCACACCG 
GTTGCCGCGC 



This encodes a protein having amino acid sequence <SEQ ID 764>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVH MA MIDAPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAVVFGAA IPFAAG RWGS GWVHVK LCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102ng and ORF102-1 show 98.6% identity in 142 aa overlap: 



MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DAPRGNPEYVRLSGMAVRLYRFMSPL 



orf 102-1 . pep GFGAVVFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
orfl02ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 



orfl02-l.pep 
orfl02ng 



VFNEIPVLLMVAALYLWFKPFX 
VFNE I PVLLMVAALYLWFKP FX 



In addition, ORF102ng shows significant homology to a membrane protein from H.pylori: 

.ntegral membrane protein 



gi 1 2314656 (AE000647) conserved hypotheti 
[Helicobacter pylori] Length = 148 
Score = 79.2 bits (192), Expect = le-14 
Identities = 50/147 (34%), Positives = f 



J/147 (46%), Gaps = 13/147 (8%) 



Sbj ct : 
Sb j ct : 
Sbjct: 



3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 

8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGVVQIQEK—KLYSFIASPAM 65 

63 GAWFGAAIP FAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFS 115 

G + + F +G GW+H KL L ++LLAY YC +R + + 
66 GFTLITGILMLLIEPTLFKSG GWLflAKLALWLLLAYHFYCKKCMRELEKDPTRRN 121 

116 HRWYRVFNEIPXXXXXXXXXXXXFKPF 142 

R+YRVFNE P KPF 
122 ARFYRVFNEAPTILMILIVILWVKPF 148 
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Based on this analysis, it is predicted that these proteins from N.meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 91 

The following partial DNA sequence was identified in N.meningitidis <SEQ ED 765>: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 

101 TTACGGAAAC GGTCAGGCGC GGC // 

//.. ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GATTATTCCG TCGCTGACCG 

251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT ATGAGAGACA GTATGAATAC 

351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

401 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

451 CCGCGCCGAT AA 

This corresponds to the amino acid sequence <SEQ ID 766; ORF85>: 

1 MAKMMKWAAV AAVAAA AVWG GWS . LKPEPH VLDITETVRR G 



201 I SFriXSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVJCNRGGKA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKVVI SEITAAEQQE SGERALGGPP RR* 

Further work revealed the ftirther partial nucleotide sequence <SEQ ID 767>: 

1 . . GTATCGGTCG GCGCGCAGGC ATCGGGGCAG ATTAAGATAC TTTATGTCAA 
51 ACTCGGGCAA CAGGTTAAAA AGGGCGATTT GATTGCGGAA ATCAATTCGA 
101 CCTCGCAGAC CAATACGCTC AATACGGAAA AATCCAAGTT GGAAACGTAT 
151 CAGGCGAAGC TGGTGTCGGC ACAGATTGCA TTGGGCAGCG CGGAGAAGAA 
201 ATATAAGCGT CAGGCGGCGT TATGGAAGGA AAACGCGACT TCCAAAGAGG 
251 ATTTGGAAAG CGCGCAGGAT GCGTTTGCCG CCGCCAAAGC CAATGTTGCC 
301 GAGCTGAAGG CTTTAATCAG ACAGAGCAAA ATTTCCATCA ATACCGCCGA 
351 GTCGGAATTG GGCTACACGC GCATTACCGC AACGATGGAC GGCACGGTGG 
401 TGGCGATTCT CGTGGAAGAG GGGCAGACTG TGAACGCGGC GCAGTCTACG 
451 CCGACGATTG TCCAATTGGC GAATCTGGAT ATGATGTTGA ACAAAATGCA 
501 GATTGCCGAG GGCGATATTA CCAAGGTGAA GGCGGGGCAG GATATTTCGT 
551 TTACGATTTT GTCCGAACCG GATACGCCGA TTAAGGCGAA GCTCGACAGC 
601 GTCGACCCCG GGCTGACCAC GATGTCGTCG GGCGGTTACA ACAGCAGTAC 
651 GGATACGGCT TCCAATGCGG TCTACTATTA TGCCCGTTCG TTTGTGCCGA 
701 ATCCGGACGG CAAACTCGCC ACGGGGATGA CGACGCAGAA TACGGTTGAA 
751 ATCGACGGCG TGAAAAATGT GCTGATTATT CCGTCGCTGA CCGTGAAAAA 
8 01 TCGCGGCGGC AAGGCGTTTG TGCGCGTGTT GGGTGCGGAC GGCAAGGCGG 
851 CGGAACGCGA AATCCGGACC GGTATGAGAG ACAGTATGAA TACCGAAGTA 
901 AAAAGCGGGT TGAAAGAGGG GGACAAAGTG GTCATCTCCG AAATAACCGC 
951 CGCCGAGCAA CAGGAAAGCG GCGAACGCGC CCTAGGCGGC CCGCCGCGCC 
1001 QATAR 

This corresponds to the amino acid sequence <SEQ ID 768; ORF85-l>: 

1 . .VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 
51 QAKLVSAQIA LGSAEKKYKR .QAALWKENAT SKEDLESAQD AFAAAKANVA 
101 ELKALIRQSK ISINTAESEL GYTRITATMD GTWAILVEE GQTVNAAQST 
151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISFTILSEP DTPIKAKLDS 
201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 
251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAEREIRT GMRDSMNTEV 
301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 

Computer analysis of this amino acid sequence gave the following results: 
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Homologv with a predicted ORF from N.menin^tidis (strain A) 

ORF85 shows 87.8% identity over a 41aa overlap and 99.3% identity over a 153aa overlap with 
an ORF (ORF85a) from strain A of A': meningitidis: 

10 20 30 40 

orf 8 5 . pep MAfMMKWAAVAAVAAAAVWGGWS-LKPEPHVLDITETVRRG 

I I I I I I I I I I I I I I I I I I I I I 1 I Mill:: II I I II II 

orf 85a MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITETVRRGDISRTVSATGEISPSNLVS 
10 20 30 40 50 60 

// 

80 90 100 

orf 85 . pep ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

orf 85a TIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSSG 
210 220 230 240 250 260 

110 120 130 140 150 160 

orf 85. pep GYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGGK 

orf 85a GYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGGR 
270 280 290 300 310 320 

170 180 190 200 210 220 

orf 8 5. pep AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 

II I I I II I I II I II I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I II I II I I I I 
orf 8 5a AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 

330 340 350 360 370 380 

230 

orf 8 5. pep PRRX 

I I I I 

orf85a PRRX 
390 

The complete length ORF85a nucleotide sequence <SEQ ID 769> is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAGCCGCAG GCTGCTTATA 

101 TTACGGAAAC GGTCAGGCGC GGCGACATCA GCCGGACGGT TTCTGCAACA 

151 GGGGAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCATCGGG 

201 GCAGATTAAG AAACTTTATG TCAAACTCGG GCAACAGGTT AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCTCGC AGACCAATAC GCTCAATACG 

301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

401 AGGATGATGC GACCGCTAAA GAAGATTTGG AAAGCGCACA GGATGCGCTT 

451 GCCGCCGCCA AAGCCAATGT TGCCGAGCTG AAGGCTCTAA TCAGACAGAG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA ATTGGGCTAC ACGCGCATTA 

551 CCGCAACGAT GGACGGCACG GTGGTGGCGA TTCTCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

7 01 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTACT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGCTGAT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAGGGCG TTTGTGCGCG 

1001 TGTTGGGTGC AGACGGCAAG GCGGCGGAAC GCGAAATCCG GACCGGTATG 

1051 AGAGACAGTA TGAATACCGA AGTAAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 770>: 

1 MAKMMKWAAV AAVAAAA VWG GWSYLKPEPQ AAYITETVRR GDISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STSQTNTLNT 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATAK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESELGY TRITATMDGT WAILVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNJCMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGRA FVRVLGADGK AAEREIRTGM 
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351 RDSMNTEVKS GLKEGDKVVI SEITAAEQQE SGERALGGPP RR* 

ORF85a and ORF85-1 show 98.2% identity in 334 aa overlap: 



PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VSVGAQASGQIKILYVKLGQQVKKGDLIAE 



INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 
INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 



orfSSa.pe 
orf85-l 



ALAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTVVAILVEEGQTVNAAQST 
AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 



PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 



)rf85a.pep 
)rf85-l 



GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVfCNVLIIPSLTVKNRGG 
GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 



orfSSa.pep 
orf85-l 



RAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKVVISEITAAEQQESGERALGG 
KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKVVISEITAAEQQESGERALGG 



orfSSa.pep 
orf85-l 



PPRRX 
MM 
PPRRX 



Figure 19D shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF85a.. 
Homology with a predicted ORF from N.sonorrhoeae 

ORF85 shows a high degree of identity with a predicted ORF (ORF85ng) from N. gonorrhoeae: 



GRF85 
ORF85ng 



1 MAKMMKWAAVAAVAAAAVWGGWS . LKPEPHVLDITETVRRG 40 

1 MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITEAVRRGDISRTVSAT 50 



ORF85 ISFTILSEPDT 250 

ORFSSng 201 TVNAAQSTPTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDT 250 

ORF85 251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 300 

ORFSSng 251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 300 

ORF85 301 MTTQNTVEIDGVKNVLIIPSLTVKNRGGKAFVRVLGADGKAAEREIRTGM 350 

ORFSSng 301 MTTQNTVEIDGVKNVLLIPSLTVKNRGGJCAFVRVLGADGKAVEREIRTGM 350 

0RF8S 152 RDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 3 93 

ORFSSng 3S1 KDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 
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The complete length ORF85ng nucleotide sequence <SEQ ID 77 1> is: 



ATGGCAAARA 
GGTTTGGGGC 
TTACGGAaac 
GgcgAGATTT 
GCAGATTAAA 
ATTTGATTGC 
GAAAAATCCA 
TGCATTGGGC 
AGGATGATGC 
GCCGCCGCCA 
CAAAATTTCC 
CCGCGACGAT 
ACTGTGAACG 
GGATATGATG 
TGAAGGCGGG 
CCGATTAAGG 
GTCGGGCGGC 
ATTATGCCCG 
ATGACGACGC 
TATTCCGTCG 
TGTTGGGTGC 
AAAGACAGTA 
AGTGGTCATC 
GCGCCCTAGG 



TGATGAAATG 
GGATGGTCTT 
ggTCAGGCGC 
CGCCGTCCAA 
AAGCTTTATG 
GGAAATCAAT 
AATTGGAAAC 
AGCGCGGAGA 
GACCTCTAAA 
AAGCCAATGT 
ATCAATACCG 
GGACGGCACG 
CGGCGCAGTC 
TTGAACAAAA 
GCAGGATATT 
CGAAGCTCGA 
TACAACAGCA 
TTCGTTTGTG 
AGAATACGGT 
CTGACCGTGA 
GGACGGCAAG 
TGAATACCGA 
TCCGAAATAA 
CGGCCCGCCG 



GGCGGCTGTT 
ATCTGAAGCC 
GGCGATATCA 
CCTGGTATCG 
TCAAACTCGG 
TCGACCACGC 
GTATCAGGCG 
AGAAATATAA 
GAAGATTTGG 
TGCCGAGTTG 
CCGAGTCGGA 
GTGGTGGCGA 
TACGCCGACG 
TGCAGATTGC 
TCGTTTACGA 
CAGCGTCGAC 
GTACGGATAC 
CCGAATCCGG 
TGAAATCGAC 
AAAATCGCGG 
GCAGTGGAAC 
AGTGAAAAGC 
CCGCCGCCGA 
CGCCGATAA 



GCGGCGGTCG 
CGAACCGCAG 
GCCGGACGGT 
GTCGGCGCGC 
GCAACAGGTC 
AGACCAACAC 
AAGCTGGTGT 
GCGTCAGGCG 
AAAGCGCGCA 
AAGGCTTTAA 
TTTGGGCTAC 
TTCCCGTGGA 
ATTGTCCAAT 
CGAGGGCGAT 
TTTTGTCCGA 
CCCGGGCTGA 
GGCTTCCAAT 
ACGGCAAACT 
GGTGTGAAAA 
CGGCAAGGCG 
GCGAAATCCG 
GGGTTGAAAG 
GCAGCAGGAA 



CGGCGGCaac 
GCTGCTTATA 
TTCCGCGACG 
AGGCTTCGGG 
AAAAAGGGCG 
GATCGATATG 
CGGCACAGAT 
GCGTTGTGGA 
GGATGCGCTT 
TCAGACAGAG 
ACGCGCATTA 
AGAGGGGCAG 
TGGCGAATCT 
ATTACCAAGG 
ACCGGATACG 
CCACGATGTC 
GCGGTCTATT 
CGCCACGGGG 
ATGTGTTGCT 
TTCGTACGCG 
GACCGGTATG 
AGGGGGACAA 
AGCGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ID 772>: 

1 MAIMMKWAAV AAVAAA AVWG GWSYLKPEPQ AAYITEAVR R GD ISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STTQTNTIDM 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATSK EDLESAQDAL 

30 151 AAAKANVAEL KALIRQSKIS INTAESDLGY TRITATMDGT WAIPVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLLIPS LTVKNRGGKA FVRVLGADGK AVEREIRTGM 

351 KDSMNTEVKS GLKEGDKVVI SEITAAEQQE SGERALGGPP RR* 

35 ORF85ng and ORF85-1 show 96. 1 % identity in 334 aa overlap: 



orf 85ng 
orf85-l 



PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VSVGAQASGQIKILYVKLGQQVKKGDLIAE 



orf 85ng 
orf85-l 



INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEDLESAQD 
INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 



orf 85ng 
orf85-l 



ALAAAKANVAELKALIRQSKI S INTAE SDLGYTRI TATMDGT WAI PVEEGQTVNAAQS T 
AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 



orf 85ng 
orf85-l 



PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 

I I I I I I I I I I I I I I I I I I I I M I I I I I 1 I I I I I I I I I I ! I I I I I ! I I I I ! I I I I I I I I I 

PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 



orf85ng 
orf85-l 



GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 
GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 
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KAFVRVLGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKVVISEITAAEQQESGERALGG 
KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKVVISEITAAEQQESGERALGG 



orf85-l 



PPRRX 
PPRRX 



10 In addition, ORF85ng shows significant homology to an E.coli membrane fusion protein: 

gi 1 1787104 (AE000189) o380; 27% identical (27 gaps) to 33; 
membrane fusion protein precursor, MTRC_NEIGO SW: P43505 
colij Length = 380 
Score = 193 bits (485), Expect = 2e-48 
15 Identities = 120/345 (34%), Positives = 182/345 (51%), Gaps = 13/345 (3%) 

Query: 29 PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 88 

P Y T VR GD+ ++V ATG++ V VGAQ SGQ+K L V +G +VKK L+ 

Sbjct: 41 PVPTYQTLIVRPGDLQQSVLATGKLDALRKVDVGAQVSGQLKTLSVAIGDKVKKDQLLGV 100 

Query: 8 9 INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEXXXXXXX 148 

1+ N I ++ L +A+ A+ L A Y RQ L + A S++ 

Sbjct: 101 IDPEQAENQIKEVEATLMELRAQRQQAEAELKLARVTYSRQQRLAQTKAVSQQDLDTAAT 160 

25 Query: 14 9 XXXXXXXXXXXXXXXIRQSKISINTAESDLGYTRITATMDGTWAIPVEEGQTVNAAQST 208 

I++++ S++TA+++L YTRI A M G V I +GQTV AAQ 
Sbjct: 161 EMAVKQAQIGTIDAQIKRNQASLDTAKTNLDYTRIVAPMAGEVTQITTLQGQTVIAAQQA 220 



209 PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIjyiKLDSVDPGLTTMSS 268 

P 1+ LA++ ML K Q++E D+ +K GQ FT+L +P T + ++ VP 
221 PNILTLADMSAMLVKAQVSEADVIHLKPGQKAWFTVLGDPLTRYEGQIKDVLP 273 



Sbjct 
Sbjd 
Sbjct 

Based on this analysis, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



2 69 GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 328 

+ + ++A++YYAR VPNP+G L MT Q +++ VKNVL IP + + G 
274 TPEKVNDAIFYYARFEVPNPNGLLRLDMTAQVHIQLTDVKNVLTIPLSALGDPVG 328 

32 9 KAFVRV-LGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKVVISE 372 

+V L +G+ ERE+ G ++ + E+ GL+ GD+VVI E 
329 DNRYKVKLLRNGETREREVTIGARNDTDVEIVKGLEAGDEWIGE 373 



ORF85-1 (40.4kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 19A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
45 was used to immunise mice, whose sera were used for Western blot (Figure 19B), FACS analysis 
(Figure 19C), and ELISA (positive resuh). These experiments confirm that ORF85-1 is a 
surface-exposed protein, and that it is a useful immunogen. 



Example 92 



The following partial DNA sequence was identified mN. meningitidis <SEQ ID 773>: 

1 ..ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAATCGT 

51 TTCGACGATT AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

101 CGGTTGTCGG CAATACCCTG CACCCTACCT ACTATAGAGA CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC GAGAGCAAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 
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251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 

301 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 

351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 

401 AATATCGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 

451 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 

501 CTATACGCTG AAACTCAAAT CGGTGCAGAT CAACGGCCAG GCAGCCAAAC 

551 CGTAA 

This corresponds to the amino acid sequence <SEQ ID 774; ORF120>: 



1 . . IPAIMTFERS GNAYKIVSTI KVPLYNIRFE SGGTVVGNTL HPTYYRDIRR 

51 GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 

101 LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

151 SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 

Further work revealed the complete nucleotide sequence <SEQ ID 775>: 

1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This corresponds to the amino acid sequence <SEQ ID 776; ORF120-1>: 



1 MMKTFKMIFS AAILSAALPC AYA AGLPQSA VLHYSGSYGI PAIMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE VVKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF120 shows 92.4% identity over a 184aa overlap with an ORF (ORF120a) from sfrain A ofN. 
meningitidis: 

10 20 30 

orf 120 .pep IPATMTFERSGNAYKIVSTIKVPLYNIRFE 

orfl20a SAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIKVPLYNIRFE 
10 20 30 40 50 60 

40 50 60 70 80 90 

orf 120. pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 
I I I I I I I I I I I I I I I I I I I I I I I ! I I I I 1 I I 1 I I I I I I I : I I I I I I I I I I I I I I I 
orf 120a SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAMDLFTLAWQL 



100 110 120 130 140 150 

orf 120 . pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 

orf 120a AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
130 140 150 160 170 180 
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160 170 180 

orf 120 .pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

orfl20a SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
190 200 210 220 

The complete length ORF 120a nucleotide sequence <SEQ ID 777> is: 

1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CNAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACNA NNANNTNNGN ACNNNGNGNC 

151 AATGCTTNCA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CCTACGGCAA AGCGGNNNNN ANCNNNNNNG NGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCNTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 778>: 



1 MMKTFKNIFS AAILSAALPC AYA AGLPXSA VLHYSGSYGI PATXXXXXXX 

51 NAXKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

ORF120a and ORF120-1 show 93.3% identity in 223 aa overlap: 



MMKTFKNIFSAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIK 
MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 



orf 120a . pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 

I I I j ! I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ] I I : MINI 
orf 120-1 VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 



DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 



orf 120a . pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

I I ! I I I I I I 1 I I I I I I I I ] I I I I i I ! I I I I I I I I I I I I I I I ! I I 

orf 120-1 DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF120 shows 97.8% identity over 184 aa overlap with a predicted ORF (ORF120ng) from 
N. gonorrhoeae: 



orf 120. pep 


IPATMTFERSGNAYKIVSTIKVPLYNIRFE 


30 


orfl20ng 


1 1 ! 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

SAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIKVPLYNIRFE 


69 


orfl20.pep 


SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGE3KTEQSPKAMDLFTLAWQL 


90 


orfl20ng 


1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 
SGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 


129 
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orfl20.pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
I M I I I I I i I I I I I I I I I I I I I I M I I I I I ! I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orfl20ng AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDTVTYFFAP 

orfl20.pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 184 

I I I I I I I I I I I I 1 I I I I I I I I I I I I I I ! I I I I I 
orfl20ng SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKP 223 



The complete length ORF120ng nucleotide sequence <SEQ ID 779> is: 



ATGATGAAGA 
CCTGCCGTGC 
ATTCCGGCAG 
AATGCTTACA 
TTTCGAATCC 
ATAAAGACAT 
GGCAGCGTAA 
CAAGGCTATG 
CGAAACTCCC 
GTCGGCGGCC 
GGAAACCGAA 
CGTATTTCTT 
ACCGACGACG 
CGGACAGGCC 



CTTTTAAAAA 
GCGTATGCGG 
CTACGGCATT 
AAATCGTTTC 
GGCGGTACGG 
ACGCAGGGGC 
CCTACGGCAA 
GATTTGTTCA 
CCCGGGTCTG 
TGAATAAGGC 
GTCGTCAAAT 
CGCACCGTCC 
GCAAAACCTA 
GCCAAACCGT 



TATATTTTCC 
CAAGGCTACC 
CCCGCCACGA 
GACGATTAAA 
TTGTCGGCAA 
AAACTGTATG 
AGCGGGCGAG 
CGCTTGCCTG 
AAAATCACCA 
GGGTACGGGA 
ATCGGGTGCG 
CTGAACAATA 
TACGCTGAAG 
AA 



GCCGCCATTT 
CCAATCCGCC 
TGACATTTGA 
GTGCCGCTAT 
TACCCTGCAC 
CGGAAGCCAA 
AGCAAAACCG 
GCAGTTGGCG 
ACGGCAAAAA 
AAATACAGCA 
GCGCGGCGAC 
TTCCGGCACA 
CTCAAATCGG 



TGTCCGCCGC 
GTGCTGCACT 
ACGCAGGGGC 
ACAATATCCG 
CCTGCCTACT 
ATTCGCCGAC 
AGCAAAGCCC 
GCAAATGACG 
ACTTTATTCC 
TaggCGGCGT 
GATACGGTAA 
AATCGGCTAT 
TGCAGATCAA 



This encodes a protein having amino acid sequence <SEQ ID 780>: 

1 MMKTFKNIFS AAILSAALPC AYA ARLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE VVKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

In comparison with ORF120-1, ORF120ng shows 97.8% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120-1. pep MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

orfl20ng MMKTFKNIFSAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 



)rfl20-l.i 
>rfl20ng 



VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
VPLYNIRFESGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 



orf 120-1. pep 
orfl20ng 



DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 



orf 120-1 .pep 
orf 120ng 



DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
DTVT YFFAP S LNN I PAQ I GYT D DGKT YT LKLKS VQINGQAAKPX 



This analysis, including the presence of a putative leader sequence in the gonococcal protein 
suggests that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 93 



The following partial DNA sequence was identified 'm.N. meningitidis <SEQ ID 78 1>: 
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1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 .GCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATT . . 

This corresponds to the amino acid sequence <SEQ ID 782; 0RF121>: 

1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNI.. 

Further work revealed the complete nucleotide sequence <SEQ JD 783>: 



1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

4 01 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

4 51 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAgG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 AATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGGTG CTGGTCGGGC 

701 TGGATTCGGG GTTTGCCATC GGTATGCTTG CCGGTATTTT GGTGTTTGTC 

751 CCTTATCTCG GGGCGTTTAC GGGATTGGTG CTTGCCACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCCT ATCGGTTTGG GCGGTTTTTG 

8 51 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

901 GACCGTATCG GGCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCGGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This corresponds to the amino acid sequence <SEQ ID 784; 0RF121-1>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMLAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILSVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* 

Computer analysis of this amino acid sequence gave the following resuhs: 



Homology with a predicted ORF from N. meningitidis (strain A) 

0RF121 shows 98.7% identity over a 1 56aa overlap with an ORF (0RF121a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 121 . pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

orf 121a MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 121. pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
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orfl21a ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
70 80 90 100 110 120 

130 140 150 

orf 121 . pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNI 

I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 121a EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
130 140 150 160 170 180 



orf 121a SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
190 200 210 220 230 240 

The complete length 0RF121a nucleotide sequence <SEQ ID 785> is: 



ATGTATCGGA 
GGCGTTTGCC 
CTCCGTTTGC 
GAATGGTTGC 
GATGGTGTTT 
CTATGCTGGT 
ATCGGTTTTA 
CGGATATGTG 
ATACGGGCGA 
AGGCAGGGCG 
CTTGCTGCTT 
TTGCCAAACT 
GGCAATTTGA 
GATGCTGATT 
TGGATTCGGG 
CCCTATTTGG 
GCTCCAGTTC 
CCGTAGGACA 
GACCGTATCG 
CGGGCAGCTG 
CCGTAACCTT 
AGTTTTTACC 



GGAAAGGGCG 
GCCTTGGTCT 
GGTTGCGGCG 
AGAAAAAGGG 
TCCTTGATTT 
CGGGCAGTTC 
TGCAGAACAC 
GAAATCGATC 
GTTGAGCAAC 
GCAATATTGT 
TACTATTTCC 
GGTTCCGAGG 
ACGAGGTATT 
ATGGGTTTGG 
GTTTGCAATC 
GCGCGTTTAC 
GGTTCGTGGA 
GTTTCTCGAA 
GCCTGTCGCC 
ATGGGCTTTG 
GGTCTTGCTT 
GGGGCAGGTA 



GGGCATCAAG 
GGCTGGTTTT 
GTGCTGGCGT 
TTTGAACCGT 
TGTTGTTGGC 
AACAATTTGG 
GCTGCTGCCG 
AGGCATCTAT 
GCGCTTAAGG 
CAGCAGTATC 
TGCTGGATTG 
CGTTTTGCCG 
GGGCGAATTT 
TTTACGGCTT 
GGTATGGTTG 
AGGACTGCTG 
ACGGCATCTT 
AGTTTTTTCA 
GTTTTGGGTT 
TCGGAATGTT 
CGCGAGGGCG 



CCGTGGATGG 
CGCGCTCGGC 
ATGTATTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCGCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 
GGCAACCTGC 
GCAGCGGTGG 
GTGCTTATAC 
TTGCGCGGGC 
GGGGTTGGTG 
CCGGTATTTT 
CTGGCAACCG 
GGCTGTTTGG 
TTACGCCGAA 
ATCTTTTCGC 
GGCCGGATTG 
TGCAGAAATA 



ATGCCGGTGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATTGTCC 
GCCCCAATTA 
ATACAATCGG 
CTTCAGGCGC 
CGTTTTGATG 
TGCTGCTTCC 
TCGTGCGGCA 
GCGCATTACA 
AGCTTCTGGT 
CTGGTCGGGC 
GGTTTTTGTT 
TCGCCGCCTT 
GCGGTTTTTG 
AATCGTGGGA 
TGATGGCGTT 
CCTTTGGCCG 
TTTTGCCGGC 



This encodes a protein having amino acid sequence <SEQ ED 786>: 



1 MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMVAGILVFV 

251 PYL6AFTGLL LA TVAALLQF GSWMG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MGF VGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* 

0RF121a and 0RF121-1 show99.2% identity in 356 aa overlap: 

10 20 30 40 50 60 

orf 121a . pep MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

orf 121-1 MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 121a . pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

orf 121-1 ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
70 80 90 100 110 120 



130 140 150 160 170 180 

or f 1 2 la . pep E I DQAS I lAWLQAHTGELSNALKAWFPVLMRQGGNIVS S IGNLLLLPLLLYYFLLDWQRW 

orf 12 1-1 EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
130 140 150 160 170 180 



orfl21a.pep 



190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
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SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 



GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 



orf 121a. pep 



DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 



Homology with a predicted ORF from N.sonorrhoeae 

0RF121 shows 97.4% identity over a 156 aa overlap with a predicted ORF (0RF121ng) from 
N.gonorrhoeae: 



orf 121. pep 
orfl21ng 
orf 121. pep 
orfl21ng 
orf 121. pep 
orfl21ng 



MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

i I I i I I I I I I I I I I I I I I ! I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MYRRKGRGIKPWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 
I I I i I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I M 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 



E I DQAS I I AWLQAHTGELSNALKAWFPVLMRQGGNI 156 
EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSTIGNLLLPPLLLYYFLLDWHRW 180 

An 0RF121ng nucleotide sequence <SEQ ID 787> was predicted to encode a protein having amino 
acid sequence <SEQ ID 788>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS TI GNLLLPPLLL YYFLL DWHRW SCGIPKLVPR RFAGAYTRIT 

201 GNLNKVWGKF LRGQLLGETE RGAWCRVGR ECWEGGGARS RPSDDGWPRW 

251 GGG* 

Further work revealed the following gonoccocal DNA sequence <SEQ ID 789>: 



ATGTATCGGA 
GGCGTTTGCC 
CTCCGTTTGC 
GAATGGTTGC 
GATGGTGTTT 
CTATGCTGGT 
ATCGGTTTTA 
CGGATATGTG 
ATACGGGCGA 
AAACAGGGCG 
CTTGCTGCTT 
TCGCCAAACT 
GGTAATTTGA 
GATGCTGATT 
TGGATTCGGG 
CCCTATTTGG 
GCTCCAGTTC 
CCGTCGGTCA 
GACCGTATCG 
CGGAGAGCTG 
CCGTAACCTT 
AGTTTTTACC 



GAAAAGGACG 
GCCTTGGTCT 
GGTTGCGGCG 
AGAAAAAGGG 
TCCTTGATTT 
CGGGCAGTTC 
TGCAGAACAC 
GAAATCGATC 
GTTGAGCAAC 
GCAATATTGT 
TACTATTTCC 
GGTTCCGAGG 
ACGAGGTATT 
ATGGGCTTGG 
ATTTGCCATC 
GTGCGTTTAC 
GGTTCGTGGA 
GTTTCTCGAA 
GCCTGTCGCC 
ATGGGCTTTG 
GGTCTTGCTT 
GGGGCAGGTA 



GGGCATCAAG 
GGCTGGTTTA 
GTGCTGGCGT 
TTTGAACCGT 
TGTTGTTGGC 
AATAATTTGG 
GCTGCTGCCG 
AGGCATCTAT 
GCGCTTAAGG 
CAGCAGTATC 
TGCTGGATTG 
CGTTTTGCCG 
GGGCGAATTT 
TTTACGGTTT 
GGTATGGTTG 
GGGATTGCTG 
ACGGAATCTT 
AGTTTTTTCA 
GTTTTGGGTT 
TCGGAATGTT 
CGCGAGGGCG 



CCGTGGATGG 
CGCGCTCGGC 
ATGTGTTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCTCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 
GGCAACCTGC 
GCAGCGGTGG 
GTGCTTATAC 
TTGCGCGGTC 
GGGATTGATG 
CCGGTATTTT 
CTTGCCACTG 
GGCTGTTTGG 
TTACGCCGAA 
ATCTTTTCGC 
GGCCGGATTG 
CGCAGAAATA 



GTGCCGGCGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATTGTCC 
GCCCCAATTA 
ATACAATCGG 
TTTCAGGCGC 
CGTTTTGATG 
TGCTGCCGCC 
TCGTGCGGCA 
GCGCATTACG 
AGCTTCTGGT 
CTAGTCGGAC 
GGTGTTTGTC 
TTGCAGCCTT 
GCGGTTTTTG 
AATTGTAGGA 
TGATGGCGTT 
CCTTTGGCCG 
TTTTGCCGGC 
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This corresponds to the amino acid sequence <SEQ ID 790; 0RF121ng-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

5 151 KQGGNIVS SI GNLLLPPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVAGILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGEL MG FVGMLAGL PLAAVTLVLL REGAQKYFAG 

351 SFYRGR* 

10 0RF121ng-l and 0RF121-1 show 97.5% identity in 356 aa overlap: 



orfl21-l .pep 
orfl21ng-l 



MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
MYRRKGRGIKPWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 



orfl21-l .pep 
orfl21ng-l 



ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 



orf 121-1 .pep 
orfl21ng-l 



EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 



orf 121-1 .pep 
orfl21ng-l 



SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 



orf 121-1 .pep 
orfl21ng-l 



GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 
GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 



orfl21-l.pei 
orfl21ng-l 



DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
DRIGLSPFWVIFSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQKYFAGSFYRGRX 



In addition, 0RF121ng-l shows homology to a permease from H.influenzae: 

sp|P43969|PERM_HAEIN PUTATIVE PERMEASE PERM HOMOLOG Length = 349 
Score = 69.9 bits (168), Expect = 2e-ll 

s = 67/317 (21%), Positives = 120/317 (37%), Gaps = 7/317 (2%) 

VYALGDTLTPFAVAAVLAYVLDPLVEWL-QKKGLNRASASMSVMVFSXXXXXXXXXXXVP 8 4 
+Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 

lYFFGDLIAPLLIALVLSYLLEIPINFLNQYLKCPRMLATILIFGSFIGLAAVFFLVLVP 91 

MLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYVE-IDQASIIAWFQAHTGELSNALK 14 3 
ML Q +L S LP + N WL N Y E ID + + + F + ++ + 

MLWNQTISLLSDLPAMF NKSNEWLLNLPKNYPELIDYSMVDSIFNSVREKILGFGE 147 



Identitie 




26 


Sbjct: 


32 




85 


Sbjct: 


92 




14' 



Sbjct: 



204 NEVLGEFLRGQXXXXXXXXXXXXXXXXXXXXDSGFAIGMVAGILVFVPYXXXXXXXXXXX 263 

+ + ++ G+ + + G+ V VPY 

207 QQQISNYIHGKLLEILIVTLITYIIFLIFGLNYPLLLAFAVGLSVLVPYIGAVIVTIPVA 266 
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Query: 264 XXXXXQFGSWNGILAVWAVFAVGQFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV QL+ +P+++LP+I S++ FG L GF 

Sbjct: 267 LVALFQFGISPTFWYIIIAFAVSQLLDGNLLVPYLFSEAVNLHPLIIIISVLIFGGLWGF 326 

Query: 324 VGMLAGLPLAAVTLVLL 340 

G+ +PLA + ++ 
Sbjct: 327 WGVFFAIPLATLVKAVI 343 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the two proteins, it is predicted that the proteins from N.meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 94 



The following partial DNA sequence was identified in N.meningitidis <SEQ ID 791>: 

1 . .ACTGCTTTTT CGGCGGCGCT GCGCTTGAGT CCATCATGAC TCGTCATATT 

51 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

101 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

151 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

201 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

251 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTGTGG GTTTCTGTGC 

301 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

351 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

401 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

451 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAG.. 

This corresponds to the amino acid sequence <SEQ ID 792; ORF122>: 



1 ..TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

51 LRLYAFHPPE lAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ . . 

Further work revealed the complete nucleotide sequence <SEQ ID 793>: 



1 ATATCGTACT GGGCAAGCAG TTCGCCGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG PAPiAAGTTCA 

101 TGGTCGAGCC GGTACCGATG CCGATATATT CATTTTCGGG TACGAATTCG 

151 ACTGCTTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

301 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTTTGG GTTTCTGTGC 

451 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

501 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

751 CGTCATCGTT TGTGTTCCTG A 

This corresponds to the amino acid sequence <SEQ ID 794; ORF122-l>: 

1 ISYWASSSPD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PIYSFSGTNS 

51 T AFSAAMRLS SSCVVIFL SF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

101 LRLYAFHPPE lAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

2 51 RHRLCS* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N.meninsitidis Cstrain A) 

ORF122 shows 94.0% identity over a 182aa overlap with an ORF (ORF122a) from strain A of N. 
meningitidis: 

10 20 30 

orf 122 .pep TAFSAALRLSPSXLVIFLSFGKPYQQTAAI 
orfl22a FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 



orfl22.pep LTFFCTSCP PRSNAYQQYRRLRLYAFHP PE lAE FFVG FAFDVDARNVYAQ I GGDVGTHLR 
orf 122a LTFFXTSCPPRSNPYQQYRRLRLYAFHAPEITEFFVGFAFXVDARNVYAQIGGDVGTHLR 



NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 
I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! 
NMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 



orf 122 . pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 

orf 122a EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDVRHRLCSX 



The complete length ORF122a nucleotide sequence <SEQ ID 795> is; 



ATATCATATT 
GCCTTTGATT 
TGGTCGAACC 
ACTGCNTTTT 
TTTGTCCTTT 
TTNNNACGTC 
CTGCGACTCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCGCG 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTGCCGC 
GGTACCGATG 
CGGCGGCGAT 
GGGAAACCGT 
CTGCCCGCCG 
ATGCCTTCCA 
GANGTTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGGAAATG 
TCGGTAACGG 
TTTTTCAAAT 
TCAGTTGGTG 
TGTGTTCCTG 



TTCACTGGAT 
TCTTACCCAA 
CCGATGTATT 
GCGCTTGAGT 
ATCAACAAAC 
CGTTCAAATC 
TGCGCCCGAG 
CACGAAATGT 
AATATGCGGC 
TGACCGCCTG 
AGGACGCGGC 
GCTGCCGATA 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 



TTTTTGGAAG 
GGCTTCGATG 
CGTTTTCGGG 
TCGTCTTGTG 
AGCCGCCATC 
CTTACCAGCA 
ATAACCGAGT 
CTATGCCCAA 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
CGCATCGGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TAGATACCGC 
AAAAAGTTGA 
TACGAATTCG 
TCGTCATATT 
TTAACATTTT 
ATACCGCCGC 
TTTTCGTTGG 
ATCGGCGGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCGGAGTGTC 
TATCAGCTTT 
TACGGATGTT 



This encodes a protein having amino acid sequence <SEQ ID 796>: 

1 ISYWASSSLD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 

51 T AFSAAMRLS SSCVVIFL SF GKPYQQTAAI LTFFXTSCPP RSNPYQQYRR 

101 LRLYAFHAPE ITEFFVGFAF XVDARNVYAQ IGGDVGTHLR NMRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

ORF122a and ORF122-1 show 96.9% identity in 256 aa overlap: 



ISYWASSSLDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 
ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 



orf 122a. pep SSCWIFLSFGKPYQQTAAILTFFXTSCPPRSNPYQQYRRLRLYAFHAPEITEFFVGFAF 
orf 122-1 SSCVVIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
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orf 122a. pep XVDARNVYAQIGGDVGTHLRNMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
orf 122-1 DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 



FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 



DIVALSDTDVRHRLCSX 



DIVALSDTDVRHRLCSX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF122 shows 89.6% identity over a 182 aa overlap with a predicted ORP (ORF122ng) from 
N. gonorrhoeae: 

orf 122. pep TAFSAALRLSPSXLVIFLSFGKPYQQTAAI 30 

orfl22ng FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCVVIFLSFGKPYQQTAAI 80 

orf 122 .pep LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 90 
orfl22ng LTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAFDIDARNIDTQIGGDVGTHLR 140 

orf 122 .pep NVRRECGFLCNHGRIDIDRLPTLRLNALXRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 150 
orfl22ng NVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRIFELCGGVGKMAADVAQTCRT 200 

orf 122. pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 182 
orfl22ng EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 256 



The complete length ORF122ng nucleotide sequence <SEQ ID 797> is: 





ATGTCGTACC 


GGGCAAGCAG 


51 


GCCTTTGATT 


TTTTTACCGC '. 


101 


tgGTCGAACC 


GgtaCCGATG ( 


151 


ACTGCTTTTT 


CGGCGGCGAT C 


201 


TTTAtccttt 


gGGAAaccct < 


251 


TTTGCACGtc 


ctggccgccg ( 


301 


ctgcgcctCT 


AtgcCTTCCA ■: 


351 


TTTTGCCTTT 


GATatTGACG ( 


401 


ATGTTGGCAC 


GCATTTGCGG i 


451 


AATCACGGTC 


GTATCGACAT 


501 


TTTGATACGC 


CGCACGCAAA 1 


551 


GCGGCGGTGT 


CGGGAAAATG ( 


601 


GAGCAGCgcg 


tcggtaaCGG ( 


651 


CGAGCAGCCC 


TTTTTCAAAT ( 


701 


CTGCCTTCGG 


TCAATTGGTG C 


751 


CGTCATCGTT 


TGTGTTCCTG 1 


This encodes a 


protein having amino acid 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatC 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
cgcgTcgGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



MSYRASSSPD FLEVETAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 
T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFCTSWPP RSNPYQQYRR 
LRLYAFHPPE lAEFFVGFAF DIDARNIDTQ IGGDVGTHLR NVRCEFGFLC 
NHGRIDIDHL PTLRLNALIR RTQKDAAVRI FELCGGVGKM AADVAQTCRT 
EQRVGNGVQQ RVGIRMPEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDI 
RHRLCS* 



wo 99/24578 



-442- 



PCT/IB98/01665 



ORF122ng and ORF122-1 show 92.6% identity in 256 aa overlap: 



ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 
: I I I I I I I I I i I I : 1 I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 
MSYRASSSPDFLEVETAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 



SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
SSCWIFLSFGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAF 



orf 122-1. pep DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
orfl22ng DIDARNIDTQIGGDVGTHLRNVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRI 



FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
FELCGGVGKMAADVAQTCRTEQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLV 



orf 122-1. pep 
orfl22ng 



DIVALSDTDVRHRLCSX 
DIVALSDTDIRHRLCSX 



Based on this analysis, it is predicted that the proteins Scom N.meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 95 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 799>: 

1 . . GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 

51 CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 

101 TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCTCGGT ATTTGCGCCG 

151 ATGgGGCGGA ITTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

This corresponds to the amino acid sequence <SEQ ED 800; ORF125>: 



Further work revealed the complete nucleotide sequence <SEQ ID 801 > 



ATGTCGGGCA 
TTGGTTCGGC 
TTGCGCCTTT 
GCCGTCGGCG 
CGGACGCAGC 
CAGTGCTGTT 
GTGATGATTT 
GTGGGACGGC 
TTGTGCTGTG 
GTTTCGATGC 
CTTTTCCACG 
TCGGAACGGC 
CTTGCCGCCG 
GACGGCAACG 
GTTTGGCAGC 
CTGGGCGCAG 



ATGCCTCCTC 
GCGGCGGTAT 
GGGCTGGCAG 
GCGCGCTGTT 
TCGATGGAAA 
TTCCGTGGCG 
ACGCCGGCGC 
GAATCTTTTG 
GCTGGTTTTC 
TGCTGATGCT 
GCAGGCAGCA 
AGTCGAGCTG 
ACTACACGCG 
CTCGCCTACA 
GGCGTTGTTC 
GTTTGGGTGC 



TCCTTCATCT 
CGATTGCCGA 
CGCGGTCTGG 
TTTTGCGGCG 
GCGTGCGCCT 
AATATGCTGC 
AACGGTCAGC 
TCTGGTGGGC 
GGCGCACGCA 
GTTGGCGGTT 
CCGCCGCACA 
TCCGCCGTGA 
CCACGCGCGC 
CGCTGACCGG 
ACCGGAGAAA 
GGCAGGCATT 



TCCTCCGCCA 
AATCAGCACG 
CGGCTCTACT 
GCGTATATCG 
GTCGTTCGGC 
AACTGGCCGG 
TCCGCTTTGG 
ATTGGCAAAC 
AAACAGGCGG 
CTGTGGCTGA 
GGTTTCAGAC 
TGCCGCTTTC 
CGCCCGTTTG 
CTGCTGGATG 
CCGACGTGGC 
TTGGCGGTCG 



TCGGGCTGAT 
GGTACGCTGC 
TTTGGGTCAT 
GCGCACTGAC 
AAACGCGGTT 
CTGGACGGCG 
GCAAAGTGTT 
GGCGCGCTGA 
GCTGAAAACC 
GTGCCGAAGT 
GGCATGAGTT 
CTGGCTGCCG 
CGGCAACCCT 
TATGCCTTGG 
AAAAATCCTG 
TCCTCTCCAC 
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8 01 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

8 51 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

1201 TCTTTACAAA GGAACCCGTC ATGA 

This corresponds to the amino acid sequence <SEQ ID 802; ORF125-l>: 

1 MSGNASSPSS SSAIGLIWFG AAVSIAEI5T GTLLAPLGWQ RGLAALLLGH 

51 AVGG ALFFAA AYIGALTGRS SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 

101 VMIYAGATV3 SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEVF ST AGSTAAQVSD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAWL STVTT TFLDAYSAGA SANNISARFA E TPVAVGVTL 

301 IGTVLAVM LP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEGFDF 

351 AGLVLWLAGF ILYRFLL SSG WESSIGLT AP VMSAVAIATV SVRLFF KKTQ 

4 01 SLQRNPS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF125 shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) from sfrain A of TV. 
meningitidis: 



AGASANNISARFAETPVAVSVTLIGTVLAV 
I I : I I I I I I I : : : I I : I I : I : : : I I : I I I 
KI LLGAGLGAAGI LAWLSTVTTT FLDAYSAGVSANNI SAKLSE I P lAVAVAWGTLLAV 



MLPVTEYENFLLLIGSVFAPMGGFDCRLFRLETAX 
: I I I I I I I I I I I I I I I I I I I I : 

LLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 



The ORF125a partial nucleotide sequence <SEQ ID 803> is: 



1 ATGTCGGGCA ATGCCTCCTC TCNTTCATCT TCCGCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACACTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CNGCTCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACNCANC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

4 01 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

4 51 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAANT 

501 NTTTTCCACG GCAGGCAGCA CCGCCGCANN GGTNNCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTNA TGCCGCTTTC TTGGCTGCCG 

601 CTGGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTGTCGAC 

801 CGTTACCACC ACTTTTCTCG ATGCNTACTC CGCCGGCGTA AGTGCCAACA 

851 ATATTTCCGC CAAACTTTCG GAAATACGNA TCGCCGTTGC CGTCGCCGTT 

901 GTCGGCACAC TGCTTGCCGT CCTCCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG C. . 

This encodes a protein having the partial amino acid sequence <SEQ ID 804>: 

1 MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGG ALFFAA AYIGALTGXX SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 
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101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEXF ST AGSTAAXVXD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAVVL STVTT TFLDAYSAGV SANNISAKLS E IPIAVAVAV 

301 VGTLLAVL LP VTEYEN FLLL IG5VFAPMAA VLI ADFFVLK RREEIEG. ■ 

ORF125a and ORF125-1 show 94.5% identity in 347 aa overlap: 



orf 125a. pep MSGNASSXSSSAAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
or f 125-1 MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 



.rf 125a. pep AYIGALTGXXSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
.rfl2 5-l AYIGALTGRSSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 



130 140 150 160 170 180 

orf 125a. pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEXFSTAGSTAAXVXD 

Orfl25-l ES FVWWALANGALI VLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVS D 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 125a. pep GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

orf 125-1 GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 125a. pep TGETDVAKILLGAGLGAAGILAVVLSTVTTTFLDAYSAGVSANNISAKLSEIPIAVAVAV 

orf 12 5-1 TGETDVAKILLGAGLGAAGILAVVLSTVTTTFLDAYSAGASANNISARFAETPVAVGVTL 
250 260 270 280 290 300 

310 320 330 340 

orf 125a. pep VGTLLAVLLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 

orf 125-1 IGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAGF 
310 320 330 340 350 360 

Homology with a predicted ORF from N.sonorrhoeae 

ORF125 shows 86.2% identity over a 65aa overlap with a predicted ORF (ORF125ng) from 
N. gonorrhoeae: 



orf 125. pep AGASANNISARFAETPVAVSVTLIGTVLAV 30 

orfl25ng KILLGAGLGITGILAVVLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVTLIRTVLAV 308 

orf 125. pep MLPVTEYENFLLLIGSVFAPM-GGFDCRLFRLETA 64 

orfl25ng MLPVTEYKNFLLLIRSVFGPMAGGFDCRLFCLKTA 343 

An ORF125ng nucleotide sequence <SEQ ID 805> was predicted to encode a protein having amino 
acid sequence <SEQ ID 806>: 



1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KCG3VLF3VA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AEIPVAVGVT 

301 LIRTVLAVM L PVTEYKNFLL LIRSVFGPMA GGFDCRLFCL KTA* 
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Further work revealed the following gonococcal DNA sequence <SEQ ID 807>: 



1001 
1051 
1101 
1151 
1201 



ATGTCGGGCA 
TTGGTTCGGC 
TCGCCCCCTT 
GCCGTCGGCG 
CGGACGCAGC 
CAGTGCTGTT 
GTGATGATTT 
GTGGGACGGC 
TCGTGCTGTG 
GTTTCGATGC 
GTTCGCTTCG 
CCTTCGGAAC 
CCGCTGGCCG 
CCTGACGGCA 
TGGGTTTGGC 
CTGTTGGGCG 
CACCGTTACC 
ACAACATTTC 
CTGATCGGCA 
CTTCCTGCTG 
TTGCCGACTT 
TTTGCCGGAC 
GCTCTCGTCC 
CTGCCGTTGC 
CAATCTTTAC 



ATGCCTCCTC 
GCGGCGGTAT 
GGGCTGGCAG 
GCGCGCTGTT 
TCGATGGAAA 
TTCCGTGGCG 
ACGTCGGCGC 
GAATCCTTTG 
GCTGGTTTTC 
TGCTGATGCT 
TCCGGCACAA 
GGCAGTCGAA 
CCGACTACAC 
ACGCTCGCCT 
GGCGGCTCTG 
CGGGCTTGGG 
ACAACGTTTC 
CGCGCGTTTT 
CGGTGCTTGC 
CTTATCGGCT 
TTTCGTCTTA 
TGGTTCTGTG 
GGTTGGGAAA 
CATTGCCACC 
AAAGGAACCC 



TCCTTCATCT 
CGATTGCCGA 
CGCGGTCTGG 
TTTTGCGGCG 
GTGTGCGCCT 
AATATGCTGC 
AACGGTCAGC 
TCTGGTGGGC 
GGCGCACGCA 
GCTTGCCGTG 
ACGCCGCGCC 
CTGTCCGCCG 
GCGCCAAGCA 
ATACGCTGAC 
TTTACCGGAG 
CATAACGGGC 
TCGATACCTA 
GCGGAAATAC 
CGTCATGCTG 
CGGTATTTGC 
AAACGGCGTG 
GCTGGCAGGC 
GCAGCATCGG 
GTATCGGTAC 
GTCATGA 



TCCGCCGCCA 
AATCAGCACG 
CGGCCCTGCT 
GCGTATATCG 
GTCGTTCGGC 
AACTGGCCGG 
TCCGCTTTGG 
ATTGGCAAAC 
GAACGGGCGG 
TTGTGGTTGA 
CGCCGTTTCA 
TCATGCCGCT 
CGCCGCCCGT 
GGGCTGCTGG 
AAACCGACGT 
ATTCTGGCAG 
TTCCGCCGGC 
CCGTCGCTGT 
CCCGTTACCG 
GCCGATGGCG 
AGGAGATTGA 
TTCATCCTCT 
TCTGACCGCC 
GCCTTTTCTT 



TCGGGCTGGT 
GGTACGCTGC 
TTTGGGTCAT 
GCGCACTGAC 
AAATGCGGTT 
CTGGACGGCG 
GCAAAGTGTT 
GGCGCACTGA 
GCTGAAAACC 
GCGTCGAAGT 
GACGGCATGA 
TTCCTGGCTG 
TTGCGGCAAC 
ATGTATGCCT 
GGCGAAAATC 
TCGTCCTCTC 
GCGAGTGCGA 
CGGCGTTACC 
AATATAAAAA 
GCGGTTTTGA 
AGGCTTTGAC 
ACCGCTTCCT 
CCCGTAATGT 
TAAAAAAACC 



This corresponds to the amino acid sequence <SEQ ID 808; ORF125ng-l>: 

1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGG ALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AE IPVAVGVT 

301 LIGTVLAVM L PVTEYKN FLL LIGSVFAPMA AVLI ADFFVL KRREEIEGFD 

351 F AGLVLWLAG FILYRFLL S5 GWESSIGLTA PVMSAVAIAT VSVRLFF KKT 

401 QSLQRNPS* 

ORF125ng-l and ORF125-1 show 95.1% identity in 408 aa overlap: 

10 20 30 40 50 60 

orfl2 5-l.pep MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

orfl2 5ng-l MSGNASSPSSSAAIGLVWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 



orfl25-l.pep 



AYIGALTGRSSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 
AYIGALTGRSSMESVRLSFGKCGSVLFSVANMLQLAGWTAVMIYVGATVSSALGKVLWDG 



orf 125-1 .pep 



ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 
ESFVWWALANGALIVLWLVFGARRTGGLKTVSMLLMLLAVLWLSVEVFASSGTNAAPAVS 



orf 125-1. pep 



DGMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAAL 
DGMTFGTAVELSAVMPLSWLPLAADYTRQARRPFAATLTATLAYTLTGCWMYALGLAAAL 



orf 125-1 . pep FTGETDVAKILLGAGLGAAGILAWLSTVTTTFLDAYSAGASANNISARFAETPVAVGVT 
orfl25ng-l FTGETDVAKILLGAGLGITGILAWLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVT 
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300 310 320 330 340 350 359 

orf 125-1. pep LIGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 

orfl25ng-l LIGTVLAVMLPVTEYKNFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 
310 320 330 340 350 360 

360 370 380 390 400 

orf 125-1 .pep FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 

orfl25ng-l FILYRFLLS SGWE S SIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 

370 380 390 400 

Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N.gonorrhoeae, and their epitopes, could be useflil antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 96 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 809>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 CCTGCAGCGG A.ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT.ACGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC . CAG 

451 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG . . 

This corresponds to the amino acid sequence <SEQ ID 810; ORF126>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DK3CRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDEI VRWRADDIAE REPQLGGRFX DGIYLPTEXQ 

151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 811>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGTiAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGTVA GGCCTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGTC 

701 TGCTCCATCC GCGTTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTGCGTTCA GGGTTGGAAC TCTTGTCCGC ACTCTATGCC ATCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGCCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGCG ATAAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 
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This corresponds to the amino acid sequence <SEQ ID 812; ORF126-l>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EVVRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECVPE GLQAQYDWLI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA IHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAARL AVALF DGKDA 

351 PERDKESGLA YIRRQD* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF126 shows 90.0% identity over a ISOaa overlap with an ORF (ORF126a) from strain A oiN. 
meningitidis: 

10 20 30 40 50 60 

or f 12 6 . pep MTRIAILGGGL3GRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 

or f 12 6a MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
10 20 30 40 50 60 



70 80 90 100 110 120 

orfl2 6.pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 

orfl2 6a EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 
70 BO 90 100 110 120 



130 140 150 160 170 180 

orfl2 5.pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 

o r f 1 2 6a VRWRADDIAERE PQLGGRFS DGI YLPTEGQLDGRQI LSALADALDELNVPCHWEHECAPE 

130 140 150 160 170 180 

The complete length ORF126a nucleotide sequence <SEQ ID 813> is: 



1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCNGGAA GGCTGACCGC 

51 ACTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCT GAAGTGGTCA GGCTGGGCAG 

201 GCAGANCATC CCGCTTTGGC GCGGCATCCG ATGCCATCTG AAAACGCCTG 

251 CCATGATGCA NGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAA 

301 CCTTTATCCA ACGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACNAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

4 01 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG TGCCCCCGAA GACTTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGANNA NACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTACACCC GCGCTATCCG CTNTACATCG CCCCGAAAGA AAACCNCGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CACCTGCCAG 

801 CGTGCGTTCC GGGCTGGAAC TCTTATCCGC ACTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAATCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGANGCG 

1051 CCCGAACGCG ATGAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 

This encodes a protein having amino acid sequence <SEQ ID 814>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQXI PLWRGIRCHL KTPAMMXENG SLIVWHGQDK 

101 PLSNEFVRHL KRGGVADDXI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPE DLQAQYDWLI DCRGYGAKTA 

201 WNQSPXXTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENXV 
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FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIATGLRPT 
LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKXA 
PERDEESGLA YIRRQD* 



ORF126a and ORF126-1 show 95.4% identity in 366 aa overlap: 



MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 



EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 
EVVRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 



or f 12 6a. pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 
orf 12 6-1 VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 



orfl2 6a.pep DLQAQYDWLIDCRGYGAKTAWNQSPXXTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
orf 126-1 GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 



LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIATGLRPT 
LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 



LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKXAPERDEESGLA 
I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I : I I I I I I I I I I I I I I I : I I I I I 
LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 



orf 12 6a. pep YIRRQDX 
orfl26-l YIRRQDX 

Homology with a predicted ORF from N.sonorrhoeae 

ORF126 shows 90% identity over a 180 aa overlap with a predicted ORF (ORF126ng) from 
N. gonorrhoeae: 

or f 12 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 
orfl2 6ng MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 60 

orf 12 6. pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 120 

EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 120 



VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 
I I I I I I : I I I I I I I I I I I I I I M I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I : I : 
VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 



orfl26ng 
orfl26.pep 
orfl26ng 

An ORF126ng nucleotide sequence <SEQ ID 815> was predicted to encode a protein having amino 
acid sequence <SEQ ID 816>: 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 
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51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVRG FTRPKSRSTA PCACCTRAIR STSPRKKTTS 

251 SSSARPKSKA KAKPPPAYVP GWNSYPRSMP STPPSAKPTS SKWRPGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 817>: 



1 ATGACCCGTA TCGCCGTCCT CGGAGGCGGC CTTTCCGGAA GGCTGACCGC 

51 ATTGCAGCTT GCAGAACAAG GTTATCAGAT TGAACTTTTC GACAAGGGCA 

101 CCCGCCAAGG CGAACACGCC GCCGCCTATG TTGCCGCCGC GATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA GGCAACGCCC GAAGTCATCA GGCTGGGCAG 

2 01 GCAGAGCATT CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCTCA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGATGA AATCGCCGAA CGCGAACCGC 

4 01 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

4 51 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCT TGCCATTGGG AACACGAATG CGCCCCCCAA GACCTGCAAG 

551 CCCAATACGA CTGGGTAATC GACTGCCGGG GCTACGGCGC GAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC TTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACGC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTGCACCC GCGCTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

7 51 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTACGTTCC GGGCTGGAAC TCTTATCCGC GCTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCGCCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGCTAC AGCCGCGAAC GCCGCCTCAT 

951 CGAAATCAAC GGCCTTTTCC GGCACGGCTT TATGATTTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGTG ATGAAGAAAG CGGTTTGGCG TATATCGGAA GACAAGATTA 

1101 A 

This corresponds to the amino acid sequence <SEQ ED 818; ORF126ng-l>: 



1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 

51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIAAGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

ORF126ng-l and ORF126-1 show 95.1% identity in 366 aa overlap: 

10 20 30 40 50 60 

orf 126-1 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

orfl26ng-l MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 12 6-1. pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

orfl26ng-l EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLS3EFVRHLKRGGVADDEI 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 12 6-1 . pep VRWRADDIAEREPQLGGRF3DGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 
I I I I I i : I I I i I I I ! I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I i I I I i 1 I : I : 
orfl2 6ng-l VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12 6-1. pep GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

orfl26ng-l DLQAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
190 200 210 220 230 240 



wo 99/24578 



-450- 



PCT/IB98/01665 



250 260 270 280 290 300 

orfl2 6-l.pep LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 

orfl2 6ng-l LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPT 
250 260 270 280 290 300 

310 320 330 340 350 360 

orfl2 6-l.pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 

orfl2 6ng-l LNHHNPEIRYSRERRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKDAPERDEESGLA 
310 320 330 340 350 360 



orfl26-l .pep 
orfl26ng-l 



YIRRQDX 

II I I I I 
YIGRQDX 



Furthermore, ORF126ng-l shows homology to a putative Rhizobium oxidase flavoprotein: 

) acid oxidase flavoprotein [Rhizobium et: 



gi I 2627327 (AF004408) putative am: 

Length = 327 
Score = 169 bits (423), Expect = 
Identities = 112/329 (34%), Posi1 



163/329 (49%), Gaps = 25/329 (7%) 



Query: 3 RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 62 

RI V G G++G A QL G+++ L ++ G 
Sbjct: 2 RILVNGAGVAGLTVAWQLYRHGFRVTLAERAGTVGA-GASGFAGGMLAPWCERESAEEPV 60 

Query: 63 IRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 122 

+ LGR + W + G+L+V G+D F R G DE+ 

Sbjct: 61 LTLGRLAADWWEAA LPGHVHRRGTLWAGGRDTGELDRFSRRTS-GWEWLDEVA- 113 

Query: 123 WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 182 

lA EP L GRF ++ E LD RQ L+ALA L++ + + 
Sbjct: 114 lAALEPDLAGRFRRALFFRQEAHLDPRQALAALAAGLEDARMRLTLG WGES 165 

Query: 183 QAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYPLY 242 

+D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+P+Y 

Sbjct: 166 DVDHDRVVDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPIY 218 

Query: 243 lAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPTLN 302 

I P++ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 
Sbjct: 219 IVPRDKNRFMVGATMIESDDGGPITARSLMELLNAAYAMHPAFGEARVTETGAGVRPAYP 278 

Query: 303 HHNPEIRYSRERRLIEINGLFRHGFMISP 331 

+ P R ++E R + +NGL+RHGF+++P 
Sbjct: 27 9 DNLP—RVTQEGRTLHVNGLYRHGFLLAP 305 

This analysis suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 97 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
819>: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGtCGCG CGGG..GCTT TAGACAGTAA ATTCATGTTG 

301 AAGGCGGTAG CCATAGATAA AGATAAAAAT CCTTTTATTA TTAAGATGAA 

351 TGAAAATCTA GTAACCTTTA aTTTGCAAGA AGTCCGCCAG TTCGTGTAGT 

401 GACGGGCTGG ATTATTTTAA AGGAAATGAT AAGGACTGCA AGTTACTTAA 

451 GTAG 
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This corresponds to the amino acid sequence <SEQ ID 820; ORF127>: 

1 MTDNRGFTLV ELISVVLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 
101 KAVAIDKDKN PFIIBCMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 
151 * 

Further work revealed the following DNA sequence <SEQ ID 82 1>: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This corresponds to the amino acid sequence <SEQ ID 822; ORF127-l>: 

1 MTDNRGFTL V ELISVVLILS VLALIV YPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF127 shows 98.0% identity over a 150aa overlap with an ORF (ORF127a) from strain A of A^. 



10 20 30 40 50 60 

MTDNRGFTLVELISWLIL3VLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 
10 20 30 40 50 60 

70 80 90 100 110 120 

GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 

GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 
70 80 90 100 110 

130 140 150 

VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 

VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
120 130 140 150 

The complete length ORF127a nucleotide sequence <SEQ ID 823> is: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 824>: 



meningitidis: 

orfl27.pep 
orfl27a 

orfl27.pep 
orfl27a 

orfl27.pep 
orfl27a 



1 MTDNRGFTL V ELISVVLILS VLALIV YPSY RNYVEKAKIN TVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 
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ORF127a and ORF127-1 show 99.3% identity in 149 aa overlap: 

10 20 30 40 50 60 

or f 127a. pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 

or f 127-1 MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
10 20 30 40 50 60 

70 80 90 100 110 120 

or f 12 7a. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

orf 127-1 GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
70 80 90 100 110 120 

130 140 150 

orf 127a. pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

orf 127-1 TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
130 140 150 

Homology with a predicted ORF from N. gonorrhoeae 

ORF127 shows 97.3% identity over a 150 aa overlap with a predicted ORF (ORF127ng) from 
N.gonorrhoeae: 

orf 127 .pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 60 
orfl27ng MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAAFLENAHFMEKFYLQN 60 

orfl27 .pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 120 
orfl27ng GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 119 

orf 127. pep VTFICKKSASSCSDGLDYFKGNDKDCKLLK 150 
orfl27ng VTFICKKSASSCSDRLDYFKGNDKDCKLLK 14 9 

The complete length ORF127ng nucleotide sequence <SEQ ID 825> is: 

1 ATGACTGATA ATCGGGGGTT TACACTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 826>: 

1 MTDNRGFTL V ELISVVLILS VLALIV YPSY RNYVEKAKIN AVRAAFLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDRLDYFKG NDKDCKLLK* 

ORF127ng and ORF127-1 show 100.0% identity in 149 aa overlap: 

10 20 30 40 50 60 

orf 127-1 . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orfl27ng-l MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
10 20 30 40 50 50 

70 80 90 100 110 120 

orf 127-1. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

orfl27ng-l GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
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orf 127-1. pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

orfl27ng-l TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
130 140 150 

This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from N. meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 98 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 827> 

1 ..GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 
51 CAACCAAATG CGGAAAACCC GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 
101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 
151 CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 
2 01 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG 
251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 
301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 
351 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 
4 01 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAATGGAAA 
4 51 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT 
501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 
551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT 
601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 
651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 
701 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG. . 

This corresponds to the amino acid sequence <SEQ ID 828; ORF128>: 

1 . . l/SLASVIASQ IFLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 
51 PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 
101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 
151 RQLLSSLCFG ALLACLFVID KHNPFIPGMT LLLPCLLTAL LIRSMQYGTL 
201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 

Further work revealed the complete nucleotide sequence <SEQ LD 829>: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCCTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

4 01 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

451 CCCCTTTTGC TGATATTTTG CTGCAAAAAA ACCAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TGTTTTTGAT TTTGACTGCC TCATCGTTTT 

551 TGCCAAGCGG GTTTTATACC GACATCCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TGCTTGCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 
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1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGAG 

1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 

1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 

5 14 01 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCTGTGCCGA 

1451 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 

1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

10 1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGATATTC CCAATGTGCA 

17 01 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 

1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTCCA CAAACACGAA CGCCTGCTTA AATCTTCCCA 

1851 CGGCGGCGCA TTGCAGTAG 

15 This corresponds to the amino acid sequence <SEQ ID 830; ORF128-l>: 

1 MQAVRYRPE I DGLRAVAVLS VMIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 GIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSX^iSVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA SSFLPS GFYT DILNQPNTYY 

20 201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

251 IDKHNPF IPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QL GXPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

401 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 

25 4 51 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 

551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKSSHGGA LQ* 

Computer analysis of this amino acid sequence gave the following results: 

30 Homology with hypothetical integral membrane protein H10392 oiH.influenzae (accession number U32723) 
ORF128 and HI0392 show 52% aa identity in 180aa overlap: 

Orfl28: 1 VSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGFQQGYFDLSADENPVLHIWSLAV 60 

++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 
HI0392: 46 MALVSFIASAIFIYNDFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAV 105 

35 

Orfl28: 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E Q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYLIFPLILILAYKKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLS 165 

40 Orfl28: 121 TLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLCFGALLACLFVIDKHNPFIPGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ + FIPG+T 

HI0392: 166 NLRFPELLVGSLLAIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

Homology with a predicted ORF from N.meninsitidis (strain A) 
45 ORF128 shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) from strain A of TV. 
meningitidis: 

10 20 30 

orf 128 .pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 
50 orf 128a ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVF 



orf 128. pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
orf 128a L3NIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 



60 



orf 128 .pep 



100 110 120 130 140 150 

ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 

I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I 1 I ! M I I I I I I I I I I I I I I 
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ILFLILTATSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 



orfl28 .pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
orfl28a RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 



orf 128 .pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 

orfl28a VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 



orf 128a KMTFKKAFFCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSH 
360 370 380 390 400 410 

The complete length ORF128a nucleotide sequence <SEQ ID 83 1> is: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

4 51 CCTCTTTTGC TGATATTTTG CTGCAAAAAA ACAAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TATTTCTGAT TTTGACTGCC ACATCGTTTT 

551 TGCCAAGCGG GTTTTATACC GATATTCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TGCTTGCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 

1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 

1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 

14 01 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

1451 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 

1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGATATTC CCAATGTGCA 

17 01 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 
1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

18 01 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTTA AATCTTCTCG 
1851 CGACGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 832>: 



1 MQAVRYRPE I DGLRAVAVLS VMIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 GIIL SEIQNG SFSFRDFYTR RIKRIYPA FI AAV5LASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA TSFLPS GFYT DILNQPNTYY 

201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

251 IDKHNPF IPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

4 01 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLD3EC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQ3F LIPGFPARFR 
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ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 
KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
YMGREFHKHE RLLKSSRDGA LQ* 



MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I 
MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 



ORF128a and ORF128-1 show 99.5% identity in 622 aa overlap: 

orf 128a. pep 
orfl28-l 

SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 



orf 128a . pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orfl28a.pep 
orfl28-l 
orf 128a. pep 
orfl28-l 
orfl28a.pep 
orfl28-l 
orfl28a.pep 



SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 



I I I 



I I I I I I 



I i I 



orf 128-1 FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

orf 128a . pep DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

orf 128-1 DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

orf 128a. pep PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 128-1 PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

orf 128a. pep RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

orf 128-1 RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

orf 128a . pep YMGREFHKHERLLKSSRDGALQX 

orfl28-l YMGREFHKHERLLKSSHGGALQX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF128 shows 93.4% identity over 244 aa overlap with a predicted ORF (ORF128ng) from TV. 
gonorrhoeae: 



orfl28 .pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 
I I I I I I I I I I I I I I I I I I I I I i I : I I I : I i 

orfl28ng ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVF 

orf 128 . pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 

orfl28ng LSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISI 

orf 12 8 .pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 

orfl28ng ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 

orf 128 .pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 

orfl28ng RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 



30 
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orfl28.pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 24 
orfl2 8ng VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 35 

The complete length ORF128ng nucleotide sequence <SEQ ID 833> is: 

1 ATGCAAGCTG TCCGATACAG GCCTGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATTATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCGGGATT CCTCATTACC 

151 AACATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

2 01 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

2 51 CCCTGGCTTC GGTGATTGCT TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGAGGA AAACCATAGA GCTTTCTACG GTTTTTTTGT CCAATATTTA 

351 TTTGGGGTTC CGATTGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCGGTAGAGG AACAGTATTA CCTCCTGTAT 

451 CCTCTTTTGC TGATATTCTG TTACAAAAAA ACCAAATCAC TACGGGTGCT 

501 GCGTAATATC AGCATCATCC TGTTTCTGAT TTTGACCGCA TCATCGTTTT 

551 TGCCGGCCGG GTTTTATACC GACATCCTCA ACCAACCcaa TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GTGGGTTCGC TGTTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGAAAAT GGAAAACGGC 

701 AGTTGCTTTC ATTACTCTGT TTCGGCGCat tgCTTGTCTG CCTGTTCGTG 

7 51 ATCGACAAAC ACGATCCGTT TATCCCGGGA ATAACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCGCTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCCTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGCTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTTT ATCTCGCCCC 

1101 GTCCCTGATG CTTGTCGGTT ACAACCTGTA TTCAAGAGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGCTG CCCGGCACGC CCGTTGCTGC GGAAAATAAT 

1201 TTTCCGGAAA CCGTCTTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCGGCAGGGA AGGGTGGAAA GCTAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TGGATGAGAA GCTGGCAGAC 

1351 AACCCGTTGT GCCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCTGT 

14 01 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

14 51 GATTTGAAGC GCAATCCTTC CTGATACCCG GGTTCAAAGC CCGATTCAGG 

1501 GAAACCGTCA AGAGGATAGC CGCCGTCAAA CCTGTATATG TTTTTGCAAA 

1551 CAATACATCA ATCAGCCGTT CTCCCTTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCTATAAA CCAATACCTC CGGCCTATTC GGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGGTT AAAGATATTC CCAATGTGCA 

17 01 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATACACG 

17 51 GACGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTCA AGCATTCCCG 

1851 AGGCGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 834>: 

1 MQAVRYRPE I DGLRAVAVLS VIIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 NIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 QMRKTIELST VFLSNIYLGF RLGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCYKK TKSLRVLRN I SIILFLILTA SSFLPA GFYT DILNQPNTYY 

201 LSTLRFPELL VGSLLAVYGQ TQNGRRQTEN GKRQ LLSLLC FGALLVCLFV 

251 IDKHDP FIPG ITLLLPCLLT ALLI RSMQYG TLPTRILSA3 PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLM LVGYNLY3RG ILKQEHLRPL PGTPVAAENN 

401 FPETVLTLGD SHAGHLRGFL DYVGGREGWK AKILSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFKARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAINQYL RPIRAMGDIG 

551 KSNQAVFDLV KDIPNVHWVD AQKYLPKNTV EIHGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKHSRGGA LQ* 

ORF128ng and ORF128-1 show 95.7% identity in 622 aa overlap: 

orf 128-1 . pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
orfl28ng MQAVRYRPEIDGLRAVAVLSVIIFHLNNRWLPGGFLGVDIFFVISGFLITNIILSEIQNG 
orf 128-1. pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
orfl28ng SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVFLSNIYLGF 
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orfl28-l.pep 

orfl28ng 

orfl28-l.pep 

orfl28ng 

orfl28-l.pep 

orfl28ng 

orf 128-1. pep 

orfl28ng 

orfl28-l.pep 

orfl28ng 

orf 128-1 .pep 

orfl28ng 

orfl28-l.pep 

orfl28ng 

orf 128-1. pep 

orfl28ng 

orfl28-l.pep 

orfl28ng 



QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

RLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISIILFLILTA 

SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

SSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGKRQLLSLLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

FGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

FCLYLAPSLMLVGYNLYSRGILKQEHLRPLPGTPVAAENNFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADHPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

DYVGGREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

PVPRFEAQSFLIPGFKARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAINQYL 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 

YMGREFHKHERLLKSSHGGALQX 

YMGREFHKHERLLKHSRGGALQX 



610 



620 



In addition, ORF218ng shows homology to a hypothetical H. influenzae protein: 

sp|P439931Y3 92_HAEIN HYPOTHETICAL PROTEIN HI0392 >gi i 1074385 I pir 1 1 B64007 
hypothetical protein HI0392 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1573364 (U32723) H. influenzae predicted coding region HI0392 [Haemophilus 
influenzae] Length = 245 
Score = 239 bits (604), Expect = 3e-62 

Identities = 124/225 (55%), Positives = 152/225 (67%), Gaps = 1/225 (0%) 



YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPELLVGSLLA+ 



This analysis, including the identification of several putative transmembrane domains, suggests that 
these proteins firom N.meningitidis and N.gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 
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Example 99 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 835>: 

1 . . ATTATTTACG AATACCGCTG GATGTTTCTT TACGGCGCAC TGACGACCTT 

51 GGGGCTGACG GTCGTGGCAA C.GCGGGCGG TTCGGTATTG GGTCTGTTGT 

101 TGGCGTTGGC GCGCCTGATT CACTTGGAAA AAGCCGGTGC GCCGATGCGC 

151 GTGCTGGCGT GGGCGTTGCG TAAAGTTTCG CTGCTGTATG TTACGCTGTT 

201 CCGGGGTACG CCGCTGTTTG TGCAGATTGT GATTTGGGCG TATGTGTGGT 

251 TTCCGTTTTT CGTC. . 

This corresponds to the amino acid sequence <SEQ ID 836; ORF129>: 

1 . . Ili-fiYRWMFL YGALTTLGLT WAXAGGSVL GLLLALARLI HLEKAGAPMR 
51 VLAWALRKVS LLYVTLFRGT PLFVQIVIWA YVWFPFFV.. 

Further work revealed the complete nucleotide sequence <SEQ ID 837>: 



1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 838; ORF129-l>: 



1 MDFRFDIir£ YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFfV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORE from N.meninsitidis (strain A) 

ORF129 shows 98.9% identity over a 88aa overlap with an ORE (ORF129a) from sfrain A of M 
meningitidis: 

10 20 30 40 50 

orf 12 9 .pep IIYEYRWMFLYGALTTLGLT VVAXAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 

orfl2 9a MDFRFDIIYEYRWMFLYGALTTLGLT VVATAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 



)rfl29.pep ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV 

)rfl29a ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV HPSDGILVSGEAAIALRRGYGP LIAG 



orf 129a SLALIANSGAYIC EIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
130 140 150 160 170 180 

The complete length ORF129a nucleotide sequence <SEQ ID 839> is: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 
51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCGACG GCGGGCGGTT 
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CGGTATTGGG 
GCCGGTGCGC 
GCTGTATGTT 
TTTGGGCGTA 
TTGGTTAGCG 
GATTGCCGGT 
AGATTTTCCG 
GCGCGTTCTT 
GCCGCAGGCA 
CGCTCTTGAA 
GCGTATGTTC 
GCTTTACACC 
GGATATTCCT 



TCTGTTGTTG 
CGATGCGCGT 
ACGCTGTTCC 
TGTGTGGTTT 
GCGAGGCGGC 
TCTTTGGCAC 
CGCGGGCATC 
TGGGGCTGAC 
TTGCGCCGTA 
AGACAGCTCG 
AGAATACGAT 
GTCGCCCTGA 
GCGTTTGGAA 



GCGTTGGCGC 
GCTGGCGTGG 
GGGGTACGCC 
CCGTTTTTCG 
AATCGCGCTG 
TGATCGCCAA 
CAGTCTATAG 
CTATCCGCAG 
TGCTGCCGCC 
CTGCTGTCGG 
TACGGGCCGG 
TTTATCTGTT 
AAACGTTACA 



GCCTGATTCA 
GCGTTGCGTA 
GCTGTTTGTG 
TCCATCCTTC 
CGTCGCGGAT 
CTCGGGGGCG 
ACAAAGGACA 
GCGATGCGCT 
TTTGGCGAGC 
TCATTGCTGT 
TATTCGGTTT 
GATGACGACT 
ATCCGCAACA 



CTTGGAAAAA 
AGGTTTCGCT 
CAGATTGTGA 
AGACGGCATT 
ACGGGCCGCT 
TATATCTGTG 
GATGGAGGCG 
ATGTGATTCT 
GAGTTCATCA 
GGCGGAGTTG 
ATGAAGAACC 
TTCTTAGGCT 
CCGCTGA 



This encodes a protein having amino acid sequence <SEQ ID 840>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT VVAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

ORF129a and ORF129-1 show 100.0% identity in 248 aa overlap: 

orf 129a . pep MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
orf 129-1 MDFRFDIIYEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEICAGAPMRVLAW 
orf 129a . pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 



orfl29-l 

orfl29a.pep 

orfl29-l 

orf 129a. pep 

orfl29-l 

orf 129a. pep 

orfl29-l 



I I I 



I I I I I 



I I I I I I 



ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

SLALIAN3GAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 

EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 

KRYNPQHRX 

KRYNPQHRX 



Homology with a predicted ORF from N. gonorrhoeae 



ORF129 shows 9 
N. gonorrhoeae: 
orf 12 9. pep 
orfl29ng 
orf 12 9. pep 
orfl2 9ng 



\.9Vq identity over a 88 aa overlap with a predicted ORF (ORF129ng) from 



IIYEYRWMFLYGALTTLGLTWAXAGGSVLGLLLALARLIHLEKAGAPMRVLAW 54 

I I I ! I I I I I I I I I I I I I I I I I I I : I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 60 

ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFV 88 

I I I I ! I I I I I I I I I I I I I I I I I I I I I I I 1 I I I ! I 

ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVILHTAFLGNAMRQSRRVPDKGRWIAG 120 

An ORF129ng nucleotide sequence <SEQ ID 841 > was predicted to encode a protein having amino 
acid sequence <SEQ ID 842>: 



1 MDFRFDIIYE YRWMFLYGAL TTLGLT VVAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTPL FV QIVIWAYVWF PFFVIL HTAF 

101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 

151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence <SEQ ED 843>: 

1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 
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51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

101 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 844; ORF129ng-l>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l and ORF129-1 show 99.2% identity in 248 aa overlap: 

orf 12 9-1 . pep MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

orfl2 9ng-l MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

orf 12 9-1 . pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orfl2 9ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 12 9-1 . pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

orfl2 9ng-l SLALIANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 12 9-1 . pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 
orfl2 9ng-l EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLE 

orf 12 9-1. pep KRYNPQHRX 
I I I I I I I I I 
orfl29ng-l KRYNPQHRX 

In addition, ORF129ng-l is homologous to an ABC transporter from A.fulgidus: 

2650409 (AE001090) glutamine ABC transporter, permease protein (glnP) 
[Archaeoglobus fulgidus] Length = 224 
Score = 132 bits (329) , Expect = 2e-30 

Identities = 86/178 (48%), Positives = 103/178 (57%), Gaps = 18/178 (10%) 

Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHP3DGILVSGEAAIALRRGYGPLIAGSLAL 124 

+S YV + RGTPL VQI+I +F P+ GI + E A G +AL 

Sbjct: 58 ISTAYVEVIRGTPLLVQILI VYFGLPAIGINLQPEPA GIIAL 99 

Query: 125 IANSGAYICEIFRAGIQ3IDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLASEFIT 184 

3GAYI EI RAGI+SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 SICSGAYIAEIVRAGIE3IPIGQMEAARSLGMTYLQAMRYVIFPQAFRNILPALGNEFIA 159 

Query: 185 LLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLEKR 242 

LLKDSSLLSVI++ EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLSVISIVELTRVGRQIVNTTFNAWTPFLGVALFYLMMTIPL3RLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 100 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 845>: 

1 . . CTGAAAGAAT GCCGTCTGAA AGACCCTGTT TTTATTCCAA ATATCGTTTA 
51 TAAGAACATC GCCATTACTT TCCTGCTCTT GCACGCCGCC GCCGAACTTT 
101 GGCTGCCCGC GCAAACCGCC GGTTTTACCG CGCTCGCCGT CGGCTTCATC 
151 CTGCTCGCCA AGCTGCGTGA gCTTCACCAT CACGAACTCT TACGTAAACA 
201 cTACGTCCGC ACTTATTACy TGCTCCAACT CTTTGCCGCC GCAGgcTAgT 
251 TTGTGGACAG GCGCGGCGwA ATTACAAAAC CTGCCCGCyT CCGCGCCCCT 
301 GCACCTGATT ACCCTCGGCG GCATGATGGG CGGCGTGATG ATGGTGTGGc 
351 TGACCGCCGG ACTGTGGCAC AGCGGCTTTA CCAAACTCGA CTACCCCAAA 
4 01 CTCTGCCGCA TTGCCGTCCC CATCCTTTTC GCCGCCGCCG TCTCGCGCGC 
4 51 TTTCTTGrTG AACGTGAACC CGrTATTTTT CATTACCGTT CCTGCGATTC 
501 TGACCGCCGG CGTATTCGTA CTGTATCTTT TCrCGTTTAT ACCGATATTT 
551 CGGGCGAATG CGTTTACAGA CGATCCGGAr TAr 

This corresponds to the amino acid sequence <SEQ ID 846; ORF130>: 

1 . . LK£CRLKDPV FIPNIVYECNI AITFLLLHAA AELWLPAQTA GFTALAVGFI 
51 LLAKLRELHH HELLRKHYVR TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 
101 HLITLGGMMG GVMMVWLTAG LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 
151 FLXNVNPXFF ITVPAILTAA VFVLYLFXFI PIFRANAFTD DPE* 

Further work revealed the complete nucleotide sequence <SEQ ID 847>: 

1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

4 01 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

451 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAAAAACA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 

901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 

951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 

1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This corresponds to the amino acid sequence <SEQ ID 848; ORF130-1>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTA AL 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALffECRL KDPVFIPNIV YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTALA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGGVMMVW LT AGLWHSGF TKLDYPKLCR 

301 lAVPILFAAA VSRAFLMNVN PIFFITVPAI LTAAVFVLYL FTFIPIFRAN 



Computer analysis of this amino acid sequence gave the following results: 
Homologv with a predicted ORE from N.meninsitidis ("strain A) 

ORF130 shows 94.3% identity over a 193aa overlap with an ORE (ORF130a) from strain A of A': 



meningitidis: 
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LKECRLKDPVFIPNIVYKNIAITFLLLHAA 
LNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNWYKNIAITFLLLHAA 



orf 130 . pep AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 
orfl30a AELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 



irf 130 . pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
)rfl30a LQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 



FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPEX 
I I I I I I I I I I I I I I I i I 1 I I I I i ::!: I I I I I I I I I I I I I I 
VLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPEX 



The complete length ORFlSOa nucleotide sequence <SEQ ID 849> is: 



ATGCGGCCGT 
GGTGTTTTTC 
TGGAACTTAT 
TTGGACTGGA 
GGCGGCATTA 
CTGCCTCGTT 
GCCCGGCTGA 
GTTACTTGCC 
ATTTGAACCT 
TTCGTATCCG 
ATGCCGTCTG 
TCGCCATTAC 
GCGCAAACCG 
CAAGCTGCGT 
GCACTTATTA 
GGCGCGGCGA 
TACCCTCGGT 
GACTGTGGCA 
ATCGCCGTCC 
GAACGTAAAC 
CCGTGTTCGT 
GCGTTTACAG 



TTTTCGTCGG 
ATCAACCCCG 
GCTGCCGGCG 
CGGGTTTTTC 
TTGCTCGCCG 
TTTCGTCGCC 
TTTGGCTAGA 
GCGTTCACTG 
GTTGCGCGCG 
TGCGCGTCAG 
AAAGACCCAG 
CTTCCTGCTC 
CCGGTTTTAC 
GAGCTTCACC 
CCTGCTCCAA 
AATTACAAAA 
GGCATGATGG 
CAGCGGCTTT 
CCATCCTNTT 
CCGATATTCT 
GCTTTACCTG 
ACGATCCGGA 



CGCGGCGGTG 
GTGCCATCGT 
GCATACGGCG 
GGGTAACCTG 
CATCCGCTAT 
GCCTATTGGC 
CCGAAACACC 
TTTTTCAGAC 
CAAGTGCATC 
TATTCTTTTG 
TATTCATCCC 
CTGCACGCCG 
CTCGCTCGCC 
ATCACGAACT 
CTCTTTGCCG 
CCTGCCCGCC 
GCAGCGTGAT 
ACCAAGCTCG 
CGCCGCCGCC 
TCATCACCGT 
CTGACATTCG 
ATAA 



CTTGCCATAC 
CCTGCACCGC 
GTTTTTTGAC 
AAACCTGTCG 
ACTGCCCTTT 
TGGTGTTGCT 
GACAACTTCG 
GGCATATGCC 
TAAATATGGC 
GGCGCGGAAG 
CAATGTCGTC 
CCGCCGAACT 
GTCGGCTTTA 
CCTGCGCAAA 
CCGCAGGCTA 
TCCGCGCCCC 
GATGGTGTGG 
ACTACCCGAA 
GTTTCGCGCG 
CCCCGCAATT 
TACCGATCTT 



TCGGTGCGCT 
CAAATTTTCT 
TGCGGCTTTG 
CGACTTTGAT 
TCGCCGCAAA 
GCTGTTCTGC 
CCCTGCTAAT 
GTCAGCGGCG 
GGCGGTGATG 
CCCTGAAAGA 
TATAAAAACA 
TTGGCTGCCT 
TCCTGCTTGC 
CACTACGTCC 
TTTGTGGACA 
TGCACCTGAT 
CTGACTGCCG 
ACTCTGCCGC 
CTGTTTTAAT 
CTGACCGCCG 
TCGGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ID 850>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAA L 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DHF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNVV YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTSLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGSVMMVW LT AGLWHSGF TKLDYPKLCR 

301 lAVPILFAAA VSRAVLM NVN P IFFITVPAI LTAAVFVL YL LTFVPIFRAN 

351 AFTDDPE* 

ORF130a and ORF130-1 show 98.3% identity in 357 aa overlap: 

orf 130a. pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orf 130-1 MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orf 130a. pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 

orf 130-1 KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 

orf 130a. pep AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNW 
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AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

YKNIAITFLLLHAAAELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ' 

LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCR 

LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

lAVPILFAAAVSRAVLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPH 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I I I I I I M I I I I 
lAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPE 



Homology with a predicted ORF from N. gonorrhoeae 

ORF130 shows 91.7% identity over a 193 aa overlap with a predicted ORF (ORF130ng) from 
N. gonorrhoeae: 

LKECRLKDPVFIPNIVYKNIAITFLLLHAA 30 

LNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVIYKNIAIT-LLLHAA 201 
AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 90 

AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 2 61 



orfl30.pep 
orflSOng 
orflSO.pep 
orflSOng 
orfiaO.pep 
orfl30ng 
orfl30.pep 
orfl30ng 



LQNLPASAPLHLITLGGMMGGVNMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 150 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1111:11111 
LQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVSILFASAVSRA 321 



FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPE 193 
VLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPE 3 64 

An ORF130ng nucleotide sequence <SEQ ID 851 > was predicted to encode a protein having amino 
acid sequence <SEQ ID 852>: 



1 MNKFFTHPAffi PFFVGA AVLA ILGALVFFHQ PRRYHPAPPN FLGTYAAGCI 

51 RRFFDYRFVG PDGFFRQPET CRYFDG GWA CCGCFIAVFT ATC RIFRRRL 

101 LAGVAAVLRL ADLARRQHRT LRSVDVTAAF TVFQTAYAVS GDLNLLRAQV 

151 H LNMAAVMFV SVRVSVLL GT ETLKECRLKD P VFIPNVIYK NIAITLLL HA 

201 AAELWLPAQ T AGFTALAVGF ILLAKL RELH HHELLRKHYV RTYYLLQLFA 

251 AAGYLWTGAA KLQNLPASAP LHLITLGGMT GGVMMVWLTA GLWHSGFTKL 

301 DYPKLCR IAV SILFASAVSR AVLM NVMPIF FITVPE ILTA AVFMLYLLTF 

351 VPIFRANAFT DOPE* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 853>: 



ATGCGCCCGT 
GGTGTTTTTT 
TGGAACTTAT 
TTGGACCGGA 
GGCGGTGTTG 
TTGCCGCATT 
GCCTGGCTGA 
GTTACTTGCC 
ATTTGAACTT 
TTCGTATCCG 
ATGCCGTCTG 
TCGCCATCAC 
CAAACCGCCG 
GCTGCGCGAA 
CTTATTACCT 
GCGGCGAAAC 
CCTCGGCGGC 
TGTGGCACAG 



TTTTCGTCGG 
ATCAACCCCG 
GCTGCCGGCT 
CGGGTTTTTC 
TTGCTTGTTG 
TTTCGTCGCC 
TTTGGCTCGA 
GCATTTACCG 
ACTGCGCGCG 
TCCGCGTCAG 
AAAGACCCCG 
CCTGCTGCTG 
GTTTTACTGC 
CTGCACCATC 
GCTCCAGCTC 
TGCAAAACCT 
ATGACGGGTG 
CGGCTTTACC 



TGCGGCAGTA 
GCGCTATCAT 
GCATACGGCG 
AGGCAACCTG 
CGGCTGTTTT 
GCCTATTGGC 
CCGCAACACC 
TTTTTCAGAC 
CAAGTGCATT 
CGTCCTTTTG 
TATTCATCCC 
CACGCCGCCG 
GCTTGCCGTC 
ACGAACTCTT 
TTTGCCGCCG 
GCCCGCCTCC 
GCGTGATGAT 
AAACTCGACT 



CTTGCCATAC 
CCTGCACCGC 
GTTTTTTGAC 
AAACCTGCCG 
ATTGCCGTTT 
TGGTGTTGCT 
GACAACTTCG 
GGCCTATGCC 
TGAATATGGC 
GGCACGGAAA 
CAACGTTATC 
CCGAACTTTG 
GGCTTCATCC 
ACGCAAACAC 
CAGGTTATCT 
GCGCCCCTGC 
GGTGTGGCTG 
ACCCGAAACT 



TCGGTGCGTT 
CAAATTTTCT 
TACCGCTTTG 
CTACTTTGAT 
TTACCGCAAC 
GCTGTTCTGC 
CTCTGTTGAT 
GTCAGCGGCG 
GGCGGTCATG 
CCCTGAAAGA 
TATAAAAACA 
GCTGCCCGCG 
TGCTCGCCAA 
TACGTCCGCA 
GTGGACAGGC 
ACCTGATTAC 
ACTGCCGGAC 
CTGCCGCATC 
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901 GCCGTCTCCA TCCTTTTCGC CTCCGCCGTT TCGCGCGCTG TTTTAATGAA 

951 CGTGAATCCG ATATTCTTCA TCACCGTTCC CGAGATTCTG ACCGCCGCCG 

1001 TGTTCATGCT TTACCTGCTG ACGTTCGTAC CGATTTTTCG AGCGAACGCG 

1051 TTTACAGACG ATCCGGAATA A 

This corresponds to the amino acid sequence <SEQ ID 854; ORF130ng-l>: 

1 MRPF FVGAAV LAILGALVFF I NPGAIILHR QIFLELMLPA AYGGFLTTAL 

51 LDRTGFSGNL KPA ATLMAVL LLVAAVLLPF L PQ LAAFFVA AYWLVLLLFC 

101 AWLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVH LMMAAVM 

151 FVSVRVSVLL GTETLKECRL KDP VFIPNVI YKNIAITLLL HAAAELWLPA 

201 Q TAGFTALAV GFILLAKL RE LHHHELLRKH YVRTYYLLQL FAAAGYLWTG 

251 AAKLQNLPAS APLHLITLGG MTGGVMMVWL TAGLWHSGFT KLDYPKLCRI 

301 AVSILFASAV SRAVLM NVNP IFFITVPE IL TAAVFMLYLL TFVPI FRAKA 

351 FTDDPE* 

ORF130ng-l and ORF130-1 show 92.4% identity in 357 aa overlap: 

orf 130-1 . pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orfl30ng-l MRPFFVGAAVLAILGALVFFINPGAIILHRQIFLELMLPAAYGGFLTTALLDRTGFSGNL 

orf 130-1 . pep KPVATLMAALLLAASAILPFSPQTA3FFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 

orfl30ng-l KPAATLMAVLLLVAAVLLPFLPQLAAFFVAAYWLVLLLFCAWLIWLDRNTDNFALLMLLA 

orf 130-1 . pep AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

orfl30ng-l AFT VFQTAYAVS GDLNLLRAQVHLNMAAVMFVS VRVS VLLGTETLKECRLKDPVFI PNVI 

orf 130-1. pep YECNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl30ng-l YKNIAIT-LLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

o^-f 130-1 . pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGG^^MGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orfl30ng-l LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orf 130-1 . pep lAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPEX 

orfl30ng-l lAVSILFASAVSRAVLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPEX 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 101 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 855>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATAGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG C.TGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAG. . 

This corresponds to the amino acid sequence <SEQ ID 856; 0RF131>: 

1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K.. 

Further work revealed the complete nucleotide sequence <SEQ ID 857>: 



1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 
51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 
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101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 858; 0RF131-1>: 



MEIRAIKYTA 



_GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

0RF131 shows 95.0% identity over a 121aa overlap with an ORF (0RF131a) from strain A of A': 
1 5 meningitidis: 

10 20 30 40 50 60 

orfl31.pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orfl31a MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 



YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 
I I I I I M I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I : 
YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 



orfl31.pep K 

I 

orfl31a KQGLRRNGLSERVRWX 
130 

The complete length ORFlBla nucleotide sequence <SEQ ID 859> is: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

351 TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 860>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPPSLED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

ORFlSla and 0RF131-1 show 97.0% identity in 135 aa overlap: 

orf 131a. pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I M I I I I I I I I I I I I I I I I I I I I I I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orf 131a. pep YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill: 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 

orf 131a. pep KQGLRRNGLSERVRWX 
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orfl31-l KQGLRRNGLSERVRWX 

Homology with a predicted ORF from N.sonorrhoeae 

0RF131 shows 89.3% identity over 121 aa overlap with a predicted ORF (0RF131ng) from 
N. gonorrhoeae: 



orfl31.pep 
orfl31ng 
orfl31.pep 
orf 131ng 
orfl31.pep 
orf 131ng 



MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 60 

MEIRVIKYTATAALFAFTVAGCRLAGWYECLSLSGWCKPRKPAAIDFWDIGGESPLSLED 60 

YE I PLS DGNS S VRANE YES AQQS YFYRKI GKFEXCGLDWRTRDGKPLIET FKQGGFDCLE 120 

YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 120 
K 121 



KQGLRRNGLSERVRW 134 

A complete length 0RF131ng nucleotide sequence <SEQ ID 861> was predicted to encode a 
protein having amino acid sequence <SEQ ID 862>: 

1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 863>: 



1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGCGT TGTTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtCCgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

251 ACTTTTATAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

351 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 864; 0RF131ng-l>: 



1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

0RF131ng-l and 0RF131-1 show 92.6% identity in 135 aa overlap: 

orf 131ng-l . pep MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 



orfl31ng-l.pep YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 
orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 
orfl31ng-l .pep KQGLRRNGLSERVRWX 
orfl31-l KQGLRRNGLSERVRWX 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from N. meningitidis and N.gonorrhoeae, and their epitopes, could be 
useftil antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 102 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 865> 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCtTAT ATtTcCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACgC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATgCC GGCCTCGCGC 

401 CGGGCTTCCT TATtGGCGGC GTACC.GGAA AATttCGGCG TTTCCGCCCG 

451 CCTGCCGCAA ACGCCGCGCC AAGACCCGAA CAGCCAATCG CCGTTTTTcG 

501 TCATCGAAGC CGACGAATAC GACACCGCCT TTtTCGACAA ACGTTCTAAA 

551 TtCGTGCATT ACCGTCCGCG TACCGCCGTG TTGAACAATC TGGAATTCGA 

601 CCACGCCGAC ATCTTTGCCG ACTTGGGCGC GATACAGACo CAGTTCCACT 

651 ACCTCGTGCG TACCGTGCCG TCTGAAGGCT TAATCGTCTG CAACGGACGG 

701 CAGCAAAGCC TGCAAGATAC TTTGGACAAA GGCTGCTGGA CGCCGGTGGA 

751 AAAATTCGGC ACGGAACACG GCTGGCA. . 

This corresponds to the amino acid sequence <SEQ ID 866; ORF132>: 

1 MKHIHIIGIG GTFMGGLAAI AKEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VXGKFRRFRP 

151 PAANAAPRPE QPIAVFRHRS RRIRHRLFRQ TFXIRALPSA YRRVEQSGIR 

201 PRRHLCRLGR DTDPVPLPRA YRAVXRLNRL QRTAAKPARY FGQRLLDAGG 

251 KIRHGTRLA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 867>: 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCTTAT ATTTCCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACGC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATGCC GGCCTCGCGC 

401 CGGGCTTCCT TATTGGCGGC GTACCGGAAA ATTTCGGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTGGACAAA CGTTCTAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTTGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACTA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCTT AATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGACGG 

801 CTCGTTCGAC GTGTTGCTCG ACGGCAAAAC CGCCGGACGC GTCAAATGGG 

851 ATTTGATGGG CAGGCACAAC CGCATGAACG CGCTCGCCGT CATTGCCGCC 

901 GCGCGTCATG TCGGTGTCGA TATTCAGACC GCCTGCGAAG CCTTGGGCGC 

951 GTTTAAAAAC GTCAAACGCC GGATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGACTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAACACGA TGAAGCTGGG CACGATGAAG TCCGCCCTGC 

1151 CTGTAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGTG 

1201 GACTGGGACG TCGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGAACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 TAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 GGAAAGCTGC TGGAAGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 868; ORF132-l>: 

1 MKHIHIIGIG GTFMGGLAAI A KEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHYLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKTAGR VKWDLMGRHN RMNALAVIAA 

301 ARHVGVDIQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 
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351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPVSLKEA DQVFCYAGGV 
401 DWDVAEALAP LGGRLNVGKD FDAFVAEIVK NAEVGDHILV MSNGGFGGIH 
451 GKLLEALR* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical o457 protein of E.coli (accession number U14003) 
ORF132 and o457 show 58% aa identity in 140 aa overlap: 



IHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLDEFK 63 
IHI+GI GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 
IHILGICGTFMGGLAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 61 



Orfl32: 
o457: 
Orfl32: 
o457: 
Orfl32: 
o457: 

Homology with a predicted ORF from N.meninsitidis (strain K) 

ORF132 shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) from strain A ofN. 
meningitidis: 



ADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTASML 123 

D+ +IGN KG VEA+L +PY+SGPQWL + VL WVL VAGTHGKTTTA M 
PDLVIIGNAMTRGNPCVEAVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMA 121 

124 AWVLEYAGLAPGFLIGGVXG 143 

W+LE G PGF+IGGV G 
122 TWILEQCGYKPGFVIGGVPG 141 



20 



.rf 132. pep 
.rfl32a 



30 



50 



60 



MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 
MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 



orfl32.pep 
orfl32a 



EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 
EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 



orfl32.pep 
orfl32a 



SMLAWVLEYAGLAPGFLIGGVXGKFR RFRPPAANAAPRPEQPI AVFR 

SMLAWVLEYAGLAPGFXIGGVPENFSVSARL-PQTPRQDPNSQSPFFVIEADEYDTAFFD 



170 180 190 200 210 220 

HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 

KRSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 

The complete length ORF132a nucleotide sequence <SEQ ID 869> is: 



orfl32.pep 
orfl32a 



1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGTGGGAT 

51 TGCCGCCATT GCCAAAGAAG CAGGGTTTGA ANTCAGCGGT TGCGATGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTG 

151 TATGAAGGCT TCGACACCGC GCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAAC 

301 NTGCTGCACC ATCATTGGNN ACTCGGCGTG GCGGNGACGC ACGGCAAAAC 

351 GACCACCGCG TCTATGCTCG CGTGGGTTTT GGAATATGCC GGACTCGCAC 

401 CGGGCTTCNT TATCGGCGGC GTACCGGAAA ACTTCAGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATTGAAGCC GACGAATACG ACACCGCGTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA TTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCCT CATCGTCTGC AACGGACGGC 

7 01 AGCAAAGCCT GCAAGACACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGATGG 
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801 CTCGTTCGAC GTGTTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCTTGGA 

851 GTTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCNGT CATCGCCGCC 

901 GCGCGTCATG CCGGAGTNGA CATTCAGACG GCCTGCGAAG CCTTGAGCAC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGTA 

1001 TCACCGTTTA CGACGACTTC GCCCACCATC CGACCGCTAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAGCG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAATACGA TGAAGCTGGG TACGATGAAA GCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGNTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGCACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 CAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This encodes a protein having amino acid sequence <SEQ ID 870>: 



1 MKHIHIIGIG GTFMGGIAAI A KEAGFEXSG CDAKMYPPMS TQLEALGIGV 

51 YEGFDTAQLD EFKADVYVIG NVAKRGMDW EAILNRGLPY ISGPQWLAEN 

101 XLHHHWXLGV AXTHGKTTTA SMLAWVLEYA GLAPGFXIGG VPENFSVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKKAGH VAWSLMGGHN RMNALAVIAA 

301 ARHAGVDIQT ACEALSTFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK AALPASLKEA DQVFXYAGGA 

401 DWDVAEALAP LGGRLHVGKD FDAFVAEIVK NAEAGDHILV MSNGGFGGIH 

451 TKLLDALR* 

ORF132a and ORF132-1 show 93.9% identity in 458 aa overlap: 



orfl32a.pep 

orfl32-l 

orfl32a.pep 

orfl32-l 

orfl32a.pep 

orfl32-l 

orfl32a.pep 

orfl32-l 

orfl32a.pep 

orfl32-l 

orf 132a .pep 

orfl32-l 

orfl32a.pep 

orfl32-l 

orfl32a.pep 

orfl32-l 



MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 

EFKADVYVIGNVAKRGMDWEAILNLGLPYI3GPQWLSENVLHHHWVLGVAGTHGKTTTA 

SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 

RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHNRMNALAVIAA 

LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 

ARHAGVDIQTACEALSTFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

ARILAVLEPRSNTMKLGTMKAALPASLKEADQVFXYAGGADWDVAEALAPLGGRLHVGKD 

ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 

FDAFVAEIVKNAEAGDHILVMSNGGFGGIHTKLLDALRX 

FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF132 shows 89.6% identity over 259 aa overlap with a predicted ORF (ORF132ng) from N. 
gonorrhoeae: 

orf 132 .pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 50 
orfl32ng MKHIHI IGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 60 
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orf 132 .pep EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 120 
orfl32ng EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 12 0 

orf 132 .pep SMLAWVLEYAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 180 
orfl32ng SMLAWVLEYAGLAPGFLIGGVPGKFRRFRPPTANAASRPEQQIAVFRHRSRRIRHRLFRQ 180 

orf 132. pep TFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRLNRLQRTAAKPARY 24 0 
orfl32ng TLQIRALSPAYRRVEQSGIRPRRHLRRLGRDTDPVPPPRAHRTIRRPHRLQRTAAKPARY 24 0 

orf 132. pep FGQRLLDAGGKIRHGTRLA 259 
orfl32ng FGQRLLDAGGKIRHRTRLADW 2 61 

An ORF132ng nucleotide sequence <SEQ ID 871> was predicted to encode a protein having amino 
acid sequence <SEQ ID 872>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY I3GPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPGKFRRFRP 

151 PTANAASRPE QQIAVFRHRS RRIRHRLFRQ TLQIRALSPA YRRVEQSGIR 

201 PRRHLRRLGR DTDPVPPPRA HRTIRRPHRL QRTAAKPARY FGQRLLDAGG 

251 KIRHRTRLAD W* 

Further work revealed the following gonococcal DNA sequence <SEQ ED 873>: 

1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGAT 

51 TGCCGCCATT GCCAAAGAAG CCGGGTTCAA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTA 

151 CACGAAGGCT TCGATGCCGC GCAGTTGGAA GAATTTCAAG CCGATATTTA 

201 CGTCATCGGC AATGTCGCCA GGCGCGGGAT GGATGTGGTC GAGGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAac 

301 GTGCtgcacc atcaTTGGgt ACTCGGCGTG GcagggaCGC ACGGcaaAac 

351 gaccaCcGcg tCCATGCTCG CCTGGGTCTT GGAATATGCC GGACTCGCGC 

4 01 CGGGCTTCCT CATCGGCGGt gtaccggaAA ATTTCGGCGT TTCCGCCCGC 

4 51 GTACCGGAAA CGCCGCGTCA AGACCCGAAC AGCAAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA TCGCCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGC ACCGTACCAT CCGAAGGCCT CATCGTCTGC AACGGACAGC 

7 01 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

7 51 AAATTCGGCA CCGGACACGG CTGGCAGATT GGTGAAGTCA ATGCCGACGG 

801 CTCGTTCGAC GTATTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCATGGG 

851 ATTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCCGT CATCGCTGCC 

901 GCACGCCATG CCGGAGTCGA TGTTCAGACG GCCTGCGAAG CCTTGGGTGC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGATTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG TGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAGCCGCGT TCCAACACCA TGAAACTCGG CACGATGAAG TCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGCG 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCTGCA GGCTGCGCGT 

1251 CGGTAAAGAT TTCGATACCT TCGTTGCCGA AATTGTGAAA AACGCCCGAA 

1301 CCGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 874; ORF132ng-l>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SKSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGQQQSLQDT LDKGCWTPVE 

251 KFGTGHGWQI GEVNADGSFD VLLDGKKAGH VAWDLMGGHN RMNALAVIAA 

301 ARHAGVDVQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 
351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPASLKEA DQVFCYAGGA 
401 DWDVAEALAP LGCRLRVGKD FDTFVAEIVK NARTGDHILV MSNGGFGGIH 
451 TKLLDALR* 
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ORF132ng-l and ORF132-1 show 93.2% identity in 458 aa overlap; 

orf 132ng-l .pep MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 
orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 
orfl32ng-l.pep EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 
orf 132-1 EFKADVYVIGNVAKRGMDVVEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 
orfl32ng-l.pep SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
orfl32ng-l.pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDT 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 
orfl32ng-l.pep LDKGCWTPVEKFGTGHGWQIGEVNADGSFDVLLDGKKAGHVAWDLMGGHNRMNALAVIAA 
orf 132-1 LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 
orf 132ng-l . pep ARHAGVDVQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 
orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 
orf 132ng-l . pep ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 
orfl32-l ARI LAVLE PRSNTMKLGTMKS ALPVS LKEADQVFC YAGGVDWDVAEALAPLGGRLN VGKD 

Orfl32ng-l.pep FDTFVAEIVECNARTGDHILVMSNGGFGGIHTKLLDALRX 
orf 132-1 FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 

In addition, ORF132ng-l is homologous to a hypothetical E.coli protein: 

pir||S56459 hypothetical protein o457 - Escherichia coli >gl|537075 (014003) 
ORF_o457 [Escherichia coli] >gi 11790680 (AE000494) hypothetical 48.5 kD prote: 
in fbp-pmba intergenic region [Escherichia coli] Length = 457 
Score = 474 bits (1207), Expect = e-133 

Identities = 249/439 (56%), Positives = 294/439 (66%), Gaps = 13/439 (2%) 

KEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLEEFQADIYVIGNVARRGMDWE 81 
++ G +V+G DA +YPPMST LE GI + +G+DA+QLE Q D+ +IGN RG VE 
RQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-QPDLVIIGNAMTRGNPCVE 7 9 

AILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTASMLAWVLEYAGLAPGFLIGGV 141 
A+L + +PY+SGPQWL + VL WVL VAGTHGKTTTA M W+LE G PGF+IGGV 
AVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMATWILEQCGYKPGFVIGGV 139 

PENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDKRSKFVHYRPRTAVLNNLEFDH 201 
P NF VSA L +S FFVIEADEYD AFFDKRSKFVHY PRT +LNNLEFDH 



ADIF DL AIQ QFHHLVR VP +G 1+ +L+ T+ GCW+ E G 





22 


Sbjct: 


21 




82 


Sbjct: 


80 




142 


Sbj ct: 


140 


Query: 


202 


Sbj ct : 


191 




2 62 


Sbjct: 


251 




321 


Sbjct: 


311 




380 


Sbjct: 


371 




439 



) S ++VLLDG+K G V W L+G HN N L lAAARH GV 



+RR+E++G ANG+TVYDDFAHHPTAI T+ LR +VGG ARI+AVLEPRSNTMK+G 



439 LVMSNGGFGGIHTKLLDAL 457 
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LVMSNGGFGGIH KLLD L 
Sbjct: 431 LVMSNGGFGGIHQKLLDGL 449 

Based on this analysis, it was predicted that these proteins from N. meningitidis and A 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF132-1 (26.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
20A shows the results of affinity purification of the His-fusion protein, and Figure 20B shows the 
results of expression of the GST-fiision in E.coli. Purified His-fiision protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 20C) and ELISA (positive result). These 
experiments confirm that ORF132 is a surface-exposed protein, and that it is a useful immunogen. 

Example 103 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 875> 

1 . .CCGGGCTATT ACGGCTCGGA TGACGAATTT AAGCGGGCAT TCGGAGAAAA 

51 CTCGCCGACA TinCAAGAAAC ATTGCAACCG GAGCTGCGGG ATTTATGAAC 

101 CCGTATTGAA AAAATACGGC AAAAAGCGCG CCAACAACCA TTCGGTCAGC 

151 ATTAGTGCGG ACTTCGGCGA TTATTTCATG CCGTTCGCCA GCTATTCGCG 

201 CACACACCGT ATGCCCAACA TCCAAGAAAT GTATTTTTCC CAAATCGGCG 

251 ACTCCGGCGT TCACACCGCC TTAAAACCAG AGCGCGCAAA CACTTGGCAA 

301 TTTGGCTTCr ATACCTATAA AAAAGGATTG TTAAAACAAG ATGATACATT 

351 AGGATTAAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

4 01 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

4 51 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn nnnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTTCTTAC GCCTATCAAA AAAGCACGCA ACCGACCAAC 

601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

701 GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

751 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

851 GCAAGCGTTC CATCAAACAA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

901 TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

1001 GCAATGATGC GGCAAC.GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

1101 ATACGGCGGC ACAAGCAAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence <SEQ ID 876; ORF133>: 

1 . . PGrreSDDEF KRAFGENSPT XKKHCNRSCG lYEPVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDN YIHNVYGKWW DLNGDIPSWV 

151 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

251 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNLIFRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFDPKD 

351 KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRTFLMTMS YKF* 

Further work revealed the fiirther partial DNA sequence <SEQ ID 877>: 

1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

151 CCCGGTGCGT TTACACAGCA AGATAAAAGC TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 
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1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



TCATCTCAAT 
TGTCGTCAAA 
GTTCGGCGAA 
AATACCTACG 
AGGTAATGCG 
CATCTGTCGG 
TACCGCGTGG 
TTTGGAACGG 
TCAATTCCGA 
AAATACAAGC 
CGAAGAGCAT 
TTACCCCCAT 
TTTAAATTGG 
CGATTTAAAC 
AGTTCAATTA 
GCAGCCTACA 
AGGCTGGGGG 
TCGACCTCAA 
CAAACCACTT 
CTTTCCTGAA 
GGCTTTATTC 
CAAAAATCAA 
CTACTTCGAT 
CCAATACCGT 
TCGGATGACG 
GAAACATTGC 
ACGGCAAAAA 
GGCGATTATT 
CAACATCCAA 
CCGCCTTAAA 
TATAAAAAAG 
CGGCTACCGC 
GGTGGGATTT 
GCCTACACCA 
TTTTGAGTTG 
CTTACGCCTA 
GAATCGCCCA 
GTTGAGCAGG 
GTACGCGCTG 
TTCGGCAAGA 
CAACGGGGGA 
AACAAACCGA 
GCTTACGAGC 
GTTCGACAGG 
CGCAGCGTTA 
ACGTGTAATG 
CAAAAGCGTA 
TGAGCTACAA 



TCGGTGCATC 
GGCAGCTTCA 
TCTGCGGACT 
GCCTGCTGCT 
ATGGCGGCGA 
TGTGCTTTAC 
GCGGCGGCGG 
CGCAAGCAGC 
CAGCGGAAAA 
CGTATAAAAA 
GACAAAAGCT 
CGATCCGTCC 
AATACGACGG 
ACCAAAATCG 
CGGTTTGTCT 
ATTCGGGCAG 
CTTTTAAAGG 
CAACACCGCC 
TGGGCTTCAA 
GAATTGGGGC 
CTATTTGGGG 
CCATTGTCCA 
GCCGCGCTCA 
CGGCTACCGT 
AATTTAAGCG 
AACCGGAGCT 
GCGCGCCAAC 
TCATGCCGTT 
GAAATGTATT 
ACCAGAGCGC 
GATTGTTAAA 
AGCCGCATCG 
GAACGGGGAT 
TCCAACATCG 
GAGCTGAATT 
TCAAAAAAGC 
ACAATGCGTC 
GTTTCCGCCC 
GTTGGGCAAC 
GCATCCGCGC 
AATACCAGCA 
AACTCTTGCC 
CGAAGAAAAA 
CGTTATATCG 
TTACAGCTCG 
CTGATAAAAC 
TTGACCAATT 
GTTTTAA 



TGTCGACAGC 
GCGGCTCGGC 
TTAGGCGTGG 
AAAAGGTCTG 
TAGGTGCGCG 
GGGCACAGCA 
GCAGCACATC 
GATATTTTGT 
TGGGAGCGGG 
TTACAACAAC 
GGCGGGAAAA 
AGCCTGAAGC 
CGTATTCAAT 
GCAGCCGCAA 
TTGAACCCGT 
GCAGAAATAT 
ATTTTGAAAC 
ACCTTCCGGC 
TTATTTCCAC 
TGTTTTTCGA 
CGGTTTAAGG 
ACCGGCCGGC 
AAAAAGACAT 
TTCGGCGGCG 
GGCATTCGGA 
GCGGGATTTA 
AACCATTCGG 
CGCCAGCTAT 
TTTCCCAAAT 
GCAAACACTT 
ACAAGATGAT 
ACAACTACAT 
ATTCCGAGCT 
CAATTTCAAA 
ACGATTATGG 
ACGCAACCGA 
CAAAGAAGAC 
TGCCGCGAGA 
AAACTGACTT 
GACGGCTGAA 
ATTTCCGGCA 
CGCCAGCCTT 
CCTTATTTTC 
ATCCGCTCGA 
TTCGACCCGA 
GTTGTGCAAC 
TTGCACGCGG 



AATTTTATTG 
AGGCATCAAC 
ATGACGTCGT 
ACCGGCACCA 
CAAATGGCTG 
GGCGCAGCGT 
GGAAATTTTG 
ACAAGAGGGT 
ATTTACAAAG 
CAAGAACTAC 
CCTg . CaCCG 
AGCAGTCGGC 
AAATACACGG 
AATCATCAAC 
ATACCAACCT 
CCGAAAGGGT 
CTACAACAAC 
TGCCCCGCGA 
AACGAATACG 
CGGTCCTGAT 
GCGATAAAGG 
AGCCAATATT 
TTACCGCTTA 
AATATACGGG 
GAAAACTCGC 
TGAACCCGTA 
TCAGCATTAG 
TCGCGCACAC 
CGGCGACTCC 
GGCAATTTGG 
ACATTAGGAT 
CCACAACGTT 
GGGTCAGCAG 
GACAAAGTGC 
GCGTTTTTTC 
CCAACTTCAG 
CAACTCAAAC 
TTACGGACGT 
TGGGCGGCGC 
GAACGCTATA 
ACTGGGCAAG 
TGATTTTTGA 
CGCGCCGAAG 
TGCGGGCAAT 
AAGACAAGGA 
GGCAAATACG 
ACGCACCTTT 



CCGGACTGGA 
AGCCTTGCCG 
TCAGGGCAAT 
ATTCAACCAA 
GAAAGCGGAG 
GGCGCAAAAT 
GCGCGGAATA 
GCTTTGAAAT 
GCAACAGTGG 
AaAAATACAT 
CAATACGACA 
AGGCAATCTG 
CGCAATTTCG 
CGCAATTATC 
CAATCTGACC 
CGAAGTTTAC 
GCGAAAATCC 
AACCGAGTTG 
GCAAAAACCG 
CAGGACAACG 
GCTGCTGCCC 
TCAACACGTT 
AACTACAGCA 
CTATTACGGC 
CGACATACAA 
TTGAAAAAAT 
TGCGGACTTC 
ACCGTATGCC 
GGCGTTCACA 
CTTCAATACC 
TAAAACTGGT 
TACGGGAAAT 
CACCGGGCTT 
ACAAACACGG 
ACCAACCTTT 
CGATGCGAGC 
AAGGTTATGG 
TTGGAAGTCG 
GATGCGCTAT 
TCGACGGCAC 
CGTTCCATCA 
TTTTTACGCC 
TCATWyVTCT 
GATGCGGCAA 
CGAAGACGTA 
GCGGCACAAG 
TTGATGACGA 



This corresponds to the amino acid sequence <SEQ ID 878; ORF133-l>: 



EAQIQVLEDV 
PGAFTQQDKS 
SSQFGASVDS 
NTYGLLLKGL 
YRVGGGGQHI 
KYKPYKNYNN 
FKLEYDGVFN 
AAYNSGRQKY 
QTTLGFNYFH 
QKSTIVQPAG 
SDDEFKRAFG 
GDYFMPFASY 
YKKGLLKQDD 
AYTIQHRNFK 
ESPNNASKED 
FGK3IRATAE 
AYEPKKNLIF 
TCNADKTLCN 



HVKAKRVPKD 
SGIVSLNIRG 
NFIAGLDWK 
TGTNSTKGNA 
GNFGAEYLER 
QELQKYIEEH 
KYTAQFRDLN 
PKGSKFTGWG 
NEYGKNRFPE 
SQYFNTFYFD 
ENSPTYKKHC 
SRTHRMPNIQ 
TLGLKLVGYR 
DKVHKHGFEL 
QLKQGYGLSR 
ERYIDGTNGG 
RAEVKNLFDR 
GKYGGTSKSV 



KKVFTDARAV 
DSGFGRVNTM 
GSFSGSAGIN 
MAAIGARKWL 
RKQRYFVQEG 
DKSWRENLXP 
TKIGSRKIIN 
LLKDFETYNN 
ELGLFFDGPD 
AALKKDIYRL 
NRSCGIYEPV 
EMYFSQIGDS 
SRIDNYIHNV 
ELNYDYGRFF 
V3ALPRDYGR 
NTSNFRQLGK 
RYIDPLDAGN 
LTNFARGRTF 



STRQDIFKSS 
VDGITQTFYS 
SLAGSANLRT 
ESGASVGVLY 
ALKFNSDSGK 
QYDITPIDPS 
RNYQFNYGLS 
AKILDLNNTA 
QDNGLYSYLG 
NYSTNTVGYR 
LKKYGKKRAN 
GVHTALKPER 
YGKWWDLNGD 
TNLSYAYQKS 
LEVGTRWLGN 
RSIKQTETLA 
DAATQRYYSS 
LMTMSYKF* 



ENLDNIVRSI 
TSTDAGRAGG 
LGVDDVVQGN 
GHSRRSVAQN 
WERDLQRQQW 
SLKQQSAGNL 
LNPYTNLNLT 
TFRLPRETEL 
RFKGDKGLLP 
FGGEYTGrrG 
NHSVSISADF 
ANTWQFGFNT 
IPSWVSSTGL 
TQPTNFSDAS 
KLTLGGAMRY 
RQPLIFDFYA 
FDPKDKDEDV 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with with the probable TonB-dependent receptor HI121 of H.influenzae (accession number U32801) 
ORF133 and HI121 show 57% aa identity in 363aa overlap: 



Orfl33; 
HI121: 
Orfl33: 
HI121: 
Orfl33; 
HI121: 
Orfl33: 
HI121: 
Orfl33: 
HI121: 
Orfl33; 
HI121: 
Orfl33; 
HI121: 



31 lYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTA 90 

I EP+L K G K+A NHS ++SA+ DYFMPF +YSRTHRMPNIQEM+FSQ+ ++GV+TA 
563 INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMPNIQEMFFSQVSNAGVNTA 622 

91 LKPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGPCWWDLNGDIPSWV 150 

LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 
623 LKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYIHNVYGVWW— RDGMPTWA 680 

151 SSTGLAYTIQHRXFXDKVHXXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNN 210 

S G YTI H+ + V YD GRFF N+SYAYQ++ QPTN++DAS PNN 

681 ESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAYQRTNQPTNYADASPRPNN 740 

211 ASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKS IRATAEERYI D 270 
AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 

7 41 ASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLAARYYGKSKRATIEEEYIN BOO 

271 GTNGGNTSNFRQLGKRSIKQTETLARQPLXXDFNAAYEPKKNLIFRAEVKNLFDRRYIDP 330 

G+ + R+ ++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 

801 GSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKDLIIKAEVQNLLDKRYVDP 859 

331 LDAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMS 3 90 
LDAGNDAA +RYYSS + + C D + C GG+ K+VL NFARGRT++++++ 

8 60 LDAGNDAASQRYYSSL NNSIECAQDSSAC GG3DKTVLYNFARGRTYILSLN 910 

391 YKF 393 

YKF 
911 YKF 913 



Homology with a predicted ORF from N.meninsitidis Cstrain A) 

ORF133 shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) from strain A oiN. 
meningitidis: 



PGYYGS DDE FKRAFGENS PTXKKHCNRSCGI 
III I I I I I I I I II II I I I 1111:1111 
FYFDAALKKDIYRLNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI 



irfl33.pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
>rfl33a YEPVLKKYGKKRANNH3VSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGD3GVHTAL 



)rfl33.pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 
>rfl33a KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVS 



160 170 180 190 200 210 

orfl33.pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 

orfl33a STGLAYTIQHRNFKDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNA 
630 640 650 660 670 680 



220 230 240 250 260 270 

orf 133 . pep SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 

orfl33a SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDX 
690 700 710 720 730 740 



orf 133 . pep 



280 290 300 310 320 330 

TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 
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TNGXXTSNFRQLGKRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPL 



DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 
DAGNDAATQRYY3SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSY 



orfl33.pep KFX 
I I I 

orfl33a KFX 
870 

A partial ORF133a nucleotide sequence <SEQ ID 879> is: 

1 AAAGACAAAA AAGTGTTTAC CGATGCGCGT GCCGTATCGA CCCGTCAGGA 

51 TATATTCAAA TCCANCGAAA ACCTCGACAA CATCGTACGC ANCATCCCCG 

101 GTGCGTTTAC ACANCAANAT AAAAGCTCGG GCNTTGTGTC TTTGAATATT 

151 CGCNGCGACA GCGGGTTCGG GCGGGTCAAT ACNATGGTNG ACGGCATCAC 

201 NCANACCTTT TATTCGACTT CTACCGATGC GGGCAGGGCA GGCGGTTCAT 

251 CTCAATTCGG TGCATCTGTC GACAGCAATT TTATNGCCGG ACTGGATGTC 

301 GTCAAAGGCA GCTTCAGCGG CTCGGCAGGC ATCAACAGCC TTGCCGGTTC 

351 GGCGAATCTG CGGACTTTAN GCGTGGATGA TGTCGTTCAG GGCAATANTA 

4 01 CNTACGGCCT GCTGCTAAAA GGTCTGACCG GCACCAATTC AACCAAAGGT 

4 51 AATGCGATGG CGGCGATAGG TGCGCGCAAA TGGCTGGAAA GCGGAGCATC 

501 TGTCGGTGTG CTTTACGGGC ACAGCAGGCG CAGCGTGGCG CAAAATTACC 

551 GCGTGGGCGG CGGCGGGCAG CACATCGGAA ATTTTGGCGC GGAATATCTG 

601 GAACGACGCA AGCAACGATA TTTTGAGCAA GAAGGCGGGT TGAAATTCAA 

651 TTCCAACAGC GGAAAATGGG AGCGGGATTT CCAAAAGTCG TACTGGAAAA 

7 01 CCAAGTGGTA TCAAAAATAC GATGCCCCCC AAGAACTGCA AAAATACATC 

7 51 GAAGGTCATG ATAAAAGCTG GCGGGAAAAC CTGGCGCCGC AATACGACAT 

801 CACCCCCATC GATCCGTCCA GCCTGAAGCN GCAGTCGGCA GGCAACCTGT 

851 TTAAATTGGA ATACGACGGC GTATTCAATA AATACACGGC GCAATTTCGC 

901 GATTTAAACA CCAAAATCGG CAGCCGCAAA ATCATCAACC GCAATTATCA 

951 ATTCAATTAC GGTTTGTCTT TGAACCCGTA TACCAACCTC AATCTGACCG 

1001 CAGCCTACAA TTCGGGCAGG CAGAAATATC CGAAAGGGTC GAAGTTTACA 

1051 GGCTGGGGGC TTTTNAAAGA TTTTGAAACC TACAACAACG CAAAAATCCT 

1101 CGACCTCANC AACACCTCCA CCTTCCGGCT GCCCCGTGAA ACCGAGTTGC 

1151 AAACCACTTT GGGCTTCAAT TATTTCCACA ACGAATACGG CAAAAACCGC 

1201 TTTCCTGAAG AATTGGGGCT GTTTTTCGAC GGTCCGGATC ANGACAACGG 

1251 GCTTTATTCC TATTTGGGGC GGTTTAAGGG CGATAAAGGG CTGCTGCCCC 

1301 AAAAATCAAC CATTGTCCAA CCGGCCGGCA GCCAATATTT CAACACGTTC 

1351 TACTTCGATG CCGCGCTCAA AAAAGACATT TACCGCTTAA ACTACAGCAC 

1401 CAATACCGTC GGCTACCGTT TCGGCGGCNA ATATACGGGC TATTACNGCT 

1451 CGGATGACGA ATTTAAGCGG GCATTCGGAG AAAACTCGCC GACATACANG 

1501 AAACATTGCA ACCAGAGCTG CGGAATTTAT GAACCCGTAT TGAAAAAATA 

1551 CGGCAAAAAG CGCGCCAACA ACCATTCGGT CAGCATTAGT GCGGACTTCG 

1601 GCGATTATTT CATGCCGTTC GCCAGCTATT CGCGCACACA CCGTATGCCC 

1651 AACATCCAAG AAATGTATTT TTCCCAAATC GGCGACTCCG GCGTTCACAC 

1701 CGCCTTAAAA CCAGAGCGCG CAAACACTTG GCAATTTGGC TTCAATACCT 

1751 ATAAAAAAGG ATTGTTAAAA CAAGATGATA TATTAGGATT AAAACTGGTC 

1801 GGCTACCGCA GCCGCATCGA CNACTACATC CACAACGTTT ACGGGAAATG 

1851 GTGGGATTTG AACGGGAATA TTCCGAGCTG GGTCAGCAGC ACCGGGCTTG 

1901 CCTACACCAT CCAACACCGC AATTTCAAAG ACAAAGTGCA CAAACACGGT 

1951 TTTGAGTTGG AGCTGAATTA CGATTATNGG CGTTTTTTCA CCAACCTTTC 

2001 TTACGCCTAT CAAAAAAGCA CGCAACCGAC CAACTTCAGC GATGCGAGCG 

2051 AATCGCCCAA CAATGCGTCC AAAGAAGACC AACTCAAACA AGGTTATGGG 

2101 TTGAGCAGGG TTTCCGCCCT GCCGCGAGAT TACGGACGTT TGGAAGTCGG 

2151 TACGCGCTGG TTGGGCAACA AACTGACTTT GGGCGGCGCG ATGCGCTATT 

2201 TCGGCAAGAG CATCCGCGCG ACGGCTGAAG AACGCTATAT CGACGNCACC 

2251 AATGGGGNAN NTACCAGCAA TTTCCGGCAA CTGGGCAAGC GTTCCATCAN 

2301 ACAAACCGAA ACCCTTGCCC GCCAGCCTTT GATTTTTGAT TTNTACGCCG 

2351 CTTACGAGCC GAAGAAAAAN CTTATTTTCC GCGCCGAAGT CAAAAATCTG 

2401 TTCGACAGGC GTTATATCGA TCCGCTCGAT GCGGGCAATG ATGCGGCAAC 

2451 GCAGCGTTAT TACAGTTCGT TCGACCCGAA AGACAAGGAC GAAGAAGTAA 

2501 CGTGTAATGA TGATAACACG TTATGCAACG GCAAATACGG CGGCACAAGC 

2551 AAAAGCGTAT TGACCAATTT TGCACGCGGA CNCACCTTTT TGATAACGAT 

2 501 GAGCTACAAG TTTTAA 
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This encodes a protein having (partial) amino acid sequence <SEQ ID 880>: 



KDKKVFTDAR 
RXDSGFGRVN 
VKGSFSGSAG 
NAMAAIGARK 
ERRKQRYFEQ 
EGHDKSWREN 
DLNTKIGSRK 
GWGLXKDFET 
FPEELGLFFD 
YFDAALKKDI 
KHCNQSCGIY 
NIQEMYFSQI 
GYRSRIDXYI 
FELELNYDYX 
LSRVSALPRD 
NGXXTSNFRQ 
FDRRYIDPLD 
KSVLTNFARG 



AVSTRQDIFK 
TMVDGITXTF 
INSLAGSANL 
WLESGASVGV 
EGGLKFNSNS 
LAPQYDITPI 
IINRNYQFNY 
YNNAKILDLX 
GPDXDNGLYS 
YRLNYSTNTV 
EPVLKKYGKK 
GDSGVHTALK 



RFFTNLSYAY 
YGRLEVGTRW 
LGKRSIXQTE 
AGNDAATQRY 
XTFLITMSYK 



SXENLDNIVR 
YSTSTDAGRA 
RTLXVDDWQ 
LYGHSRRSVA 
GKWERDFQKS 
DPSSLKXQSA 
GLSLNPYTNL 
NTSTFRLPRE 
YLGRFKGDKG 
GYRFGGXYTG 
RANNHSVSIS 
PERANTWQFG 
NGNIPSWVSS 
QKSTQPTNFS 
LGNKLTLGGA 
TLARQPLIFD 
YSSFDPKDKD 



XIPGAFTXQX 
GGSSQFGASV 
GNXTYGLLLK 
QNYRVGGGGQ 
YWKTKWYQKY 
GNLFKLEYDG 
NLTAAYNSGR 
TELQTTLGFN 
LLPQKSTIVQ 
YYXSDDEFKR 
ADFGDYFMPF 
FNTYKKGLLK 
TGLAYTIQHR 
DASESPNNAS 
MRYFGKSIRA 
XYAAYEPKKX 
EEVTCNDDNT 



KSSGXVSLNI 
DSNFXAGLDV 
GLTGTNSTKG 
HIGNFGAEYL 
DAPQELQKYI 
VFNKYTAQFR 
QKYPKGSKFT 
YFHNEYGKNR 
PAGSQYFNTF 
AFGENSPTYX 
ASYSRTHRMP 
QDDILGLKLV 
NFKDKVHKHG 
KEDQLKQGYG 
TAEERYIDXT 
LIFRAEVKNL 
LCNGKYGGTS 



20 ORF133a and ORF133-1 show 94.3% identity in 871 aa overlap: 



KDKKVFTDARAVSTRQDIFKSXENLDNIVRXIPGAFTXQXKS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
EAQIQVLEDVHVKAKRVPKDKKVFTDARAVSTRQDIFKSSENLDNIVRSIPGAFTQQDKS 



SGXVSLNIRXDSGFGRVNTMVDGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDVVK 
SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWK 



irf 133a. pep GSFSGSAGINSLAGSANLRTLXVDDWQGNXTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
irf 133-1 GSFSGSAGINSLAGSANLRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 



ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFEQEGGLKFNSNSGK 
I I I I I II II I I II I I I I I II M II I I I II I I I I I I I II I I I I I I I I II I : I I I I I : I I 
ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFVQEGALKFNSDSGK 



orf 133a. pep WERDFQKSYWKTKWYQKYDAPQELQKYIEGHDKSWRENLAPQYDITPIDPSSLKXQSAGN 
orf 133-1 WERDLQRQQWKYKPYKNYNN-QELQKYIEEHDKSWRENLXPQYDITPIDPSSLKQQSAGN 



LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 
I I I II II I I I I II I II I I I I I I I I I II I I I I I I I I I I I I I I II I I II I I I I I I I I I II II 
LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 



orf 133a . pep YPKGSKFTGWGLXKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 
orf 133-1 YPKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLPRETELQTTLGFNYFHNEYGKNRFP 



EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
EELGLFFDGPDQDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
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LNYSTNTVGYRFGGXYTGYYXS DDE FKRAFGENS PT YXKHCNQSCGI YEPVLKKYGKKRA 
LNYSTNTVGYRFGGEYTGYYGSDDEFKRAFGENSPTYKKHCNRSCGIYEPVLKKYGKKRA 



NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 



orfl33a.pep 



TYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVSSTGLAYTIQHRNF 
TYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVSSTGLAYTIQHRNF 



KDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 

I I I I I I I I I Ill I I II II I I I II II I I I I II II I II II I I II I I I I I I I I I II I 

KDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 



RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDXTNGXXTSNFRQLG 
RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDGTNGGNTSNFRQLG 



KRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
KRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 



orf 133a . pep SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSYKFX 
orf 133-1 SFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSYKFX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF133 shows 92.3% identity over 392 aa overlap with a predicted ORF (ORF133ng) from N. 
gonorrhoeae: 



orf 133 .pep 
orf 133ng 
orf 133 .pep 
orfl33ng 
orfl33.pep 
orfl33ng 
orf 133. pep 
orfl33ng 
orf 133 .pep 
orfl33ng 
orfl33.pep 
orfl33ng 



PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 31 

FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAFGENSPAYKEHCDPSCGL 560 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 91 

YEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNIQEMYFSQIGDSGVHTAL 620 

KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 151 

KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVG 680 

STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFS DASE S PNNA 211 

STGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 740 

SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 271 

SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 800 



TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 
I I II I I I I II I II I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I 
TNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPL 



331 
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orfl33.pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 391 

orfl33ng DAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 920 

orfl33.pep KF 393 
I I 

orfl33ng KF 922 

The complete length ORF133ng nucleotide sequence <SEQ ID 881> is predicted to encode a 
protein having amino acid sequence <SEQ ID 882>: 

1 MRSSFRLKPI CFYLMGVMLY HHSYAEDAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 VVKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 lEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLlI\nLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

401 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD lYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDECVHKH 

7 01 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGNK LTLGG AMRYFGKS IR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

A variant was also identified, being encoded by the gonococcal DNA sequence <SEQ ID 883>: 



1 ATGAGATCTT CTTTCCGGTT 

51 TATGCTATAT CATCATAGTT 

101 AGGCGCAGAT ACAGGTTTTG 

151 CCGAAAGACA AAAAAGTGTT 

201 gGATGTGTTC AAATCCGGCG 

251 CCGGTGCGTT TACACAGCAA 

301 ATTCGCGGCG ACAGCGGGTT 

351 CACGCAGACC TTTTATTCGA 

401 CATCTCAATT CGGTGCATCT 

451 GTCGTCAAAG GCAGCTTCAG 

501 TTCGGCGAAT CTGCGGACTT 

551 ATACCTACGG CCTGCTGCTA 

601 GGTAATGCGA TGGCGGCGAT 

651 GTCTGTCGGT GTGCTTTACG 

701 ACCGCGTGGG CGGCGGCGGG 

751 CTGGAACGGC GCAAACAGCA 

801 CAATGCCGGC AGCGGAAAAT 

851 AAACAAAGTG GTATAAAAAA 

901 ATCGAAGAGC ATGATAAAAG 

951 CATCACCCCC ATCGATCCGT 

1001 TGTTTAAATT GGAATACGAC 

1051 CGCGATTTAA ACACCAGAAT 

1101 TCAATTCAAT TACGGTTTGT 

1151 CCGCAGCCTA CAATTCGGGC 

1201 ACAGGCTGGG GGCTTTTAAA 

1251 CCTCGACCTC AACAACACCG 

1301 TGCAAACCAC TTTGGGCTTC 

1351 CGCTTTCCTG AAGAATTGGG 

1401 CGGGCTTTAT TCCTATTTGG 

1451 CTCAAAAATC AACCATTGTC 

1501 TTCTACTTCG ATGCCGCGCT 

1551 CACCAATGCA ATCAACTACC 

1601 GCTCGGAAAA CGAATTTAAG 

1551 AAGGAACATT GCGACCCGAG 

17 01 ATACGGCAAA AAGCGCGCCA 

17 51 TCGGCGATTA TTTCATGCCG 



GAAGCCGATT TGTTTTTATC TTATGGGTGT 
ATGCCGAAGA TGCAGGGCGC GCGGGCAGCG 
GAAGATGTGC ACGTCAAGGC GAAGCGCGTA 
TACCGATGCG CGTGCCGTAT CGACCCGTca 
AAAACCTCGA CAACATCGTA CGCAGCATAC 
GATAAAAGCT CGGGCATTGT GTCTTTGAAT 
CGGGCGGGTC AATACGATGG TGGACGGCAT 
CTTCTACCGA TGCGGGCAGG GCAGGCGGTT 
GTCGACAGCA ATTTTATTGC CGGACTGGAT 
CGGCTCGGCA GGCATCAACA GCCTTGCCGG 
TAGGCGTGGA TGACGTCGTT CAGGGCAATA 
AAAGGTCTGA CCGGCACCAA TTCAACCAAA 
AGGTGCGCGC AAATGGCTGG AAAGCGGAGC 
GGCACAGCAG GCGCGGCGTG GCGCAAAATT 
CAGCACATCG GAAATTTTGG TGAAGAATAT 
ATATTTTGTA CAAGAGGGTG GTTTGAAATT 
GGGAACGGGA TTTGCAAAGG CAATACTGGA 
TACGAAGACC CCCAAGAACT GCAAAAATAC 
CTGGCGGGAA AACCTGGCGC CGCAATACGA 
CCGGCCTGAA GCAGCAGTCG GCAGGCAATC 
GGCGTATTCA ATAAATACAC GGCGCAATTT 
CGGCAGCCGC AAAATCATCA ACCGCAATTA 
CTTTGAACCC GTATACCAAC CTCAATCTGA 
AGGCAGAAAT ATCCGAAAGG GGCGAAGTTT 
AGATTTTGAA ACCTACAACA ACGCGAAAAT 
CCACCTTCCG GCTGCCCCGC GAAACCGAGT 
AATTATTTCC ACAACGAATA CGGCAAAAAC 
GCTGTTTTTC GACGGTCCTG ATCAGGACAA 
GGCGGTTTAA GGGCGATAAA GGGCTGTTGC 
CAACCGGCCG GCAGCCAATA TTTCAACACG 
CAAAAAAGAC ATTTACCGCT TAAACTACAG 
GTTTCGGCGG CGAATATACG GGCTATTACG 
CGGGCATTCG GAGAAAACTC GCCGGCATAC 
CTGCGGGCTT TATGAACCCG TATTGAAAAA 
ACAACCATTC GGTCAGCATT AGTGCGGACT 
TTCGCCGGCT ATTCGCGCAC ACACCGTATG 



wo 99/24578 



-480- 



PCT/IB98/01665 



18 01 CCCAACATCC AAGAAATGTA TTTTTCCCAA ATCGGCGACT CCGGCGTTCA 

18 51 CACCGCCTTA AAACCAGAGC GCGCAAACAC TTGGCAATTT GGCTTCAATA 

1901 CCTATAAAAA AGGATTGTTA AAACAAGATG ATATATTAGG ATTGAAACTG 

1951 GTCGGCTACC GCAGCCGCAT TGACAACTAC ATCCACAACG TTTACGGGAA 

2001 ATGGTGGGAT TTGAACGGGG ATATTCCGAG CTGGGTCGGC AGCACCGGGC 

2051 TTGCCTACAC CATCCGACAC CGCAATTTCA AAGACAAAGT GCACAAACAC 

2101 GGTTTTGAGC TGGAGCTGAA TTACGATTAT GGGCGTTTTT TCACCAACCT 

2151 TTCTTACGCC TATCAAAAAA GCACGCAACC GACCAATTTC AGCGATGCGA 

2201 GCGAATCGCC CAACAATGCC tCCaaAGAAG ACCAACTCAA ACAAGGTTAT 

2251 GGGCTGAGCA GGGTTTCCGC CCTGCCGCGA GATTACGGAC GTTTGGAAGT 

2301 CGGTACGCGC TGGTTGGGCA ACAAACTGAC TTTGGGCGGC GCGAtgcGCT 

2351 ATTTCGGCAA GAGCATCCGC GCGACGGCTG AAGAACGCTA TATCGACGGC 

24 01 ACCAACGGGG GAAATACCAG CAATGTCCGG CAACTGGGCA AGCGTTCCAT 

24 51 CAAACAAACC GAAACCCTTG CCCGACAGCC TTTGATTTTT GATTTTTACG 

2501 CCGCTTACGA GCCGAAGAAA AACCTTATTT TCCGCGCCGA AGTCAAAAAC 

2551 CTGTTCGACA GGCGTTATAT CGATCCGCTC GATGCGGGCA ATGATGCGGC 

2 601 AACGCAGCGT TATTACAGCT CGTTCGACCC GAAAGACAAG GACGAAGACG 

2651 TAACGTGTAA TGCTGATAAA ACGTTGTGCA ACGGCAAATA CGGCGGCACA 

2701 AGCAAAAGCG TATTGACCAA TTTCGCACGC GGACGCACCT TCTTGATGAC 

2751 GATGAGCTAC AAGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 884; ORF133ng-l>: 

1 MRSSFRLKPI CFYLMGVMLY HHSYA EDAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 VVKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 lEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLFXLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

401 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD lYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

701 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGNKLTLGG AMRYFGKSIR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

ORP133ng-l and ORF133-1 show 96.2% identity in 889 aa overlap; 

10 20 30 40 50 60 

orfl33ng-l.pep SFRLKPICFYLMGVMLYHHSYAEDAGRAGSEAQIQVLEDVHVKAKRVPKDKPCVFTDARAV 

I I I I I I I I I I I I M I I I I M I I I I I I I I I I 
orfl33-l EAQ IQVLEDVHVKAKRVPKDKKVFT DARAV 

10 20 30 

70 80 90 100 110 120 

STRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

STRQDIFKSSENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 
40 50 60 70 80 90 

130 140 150 160 170 180 

orfl33ng-l.pep TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 
100 110 120 130 140 150 

190 200 210 220 230 240 

orfl33ng-l.pep NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRGVAQNYRVGGGGQHI 

orf 133-1 NTYGLLLKGLTGTNSTKGNAMAAIGARKWLE3GASVGVLYGHSRRSVAQNYRVGGGGQHI 
160 170 180 190 200 210 



orf 133ng-l .pep 
orfl33-l 



250 260 270 280 290 300 

orf 133ng-l . pep GNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERDLQRQYWKTKWYKKYEDPQELQKYIEE 
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GNFGAEYLERRKQRYFVQEGALKFNSDSGKWERDLQRQQWKYKPYKNYNN-QELQKYIEE 



)rfl33ng-l.pep HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLKTRIGSRKII 
I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I i I I I I ! I I I I : I I I I I I I 
)rf 133-1 HDKSWRENLXPQYDITPIDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKII 



irfl33ng-l.pep NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I i I I I I I i I I I I I I I I I I I ! 
1 r f 1 3 3 - 1 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGSKFTGWGLLKDFETYNNAKILDLNNT 



orfl33ng-l.pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 



orfl33ng-l.pep PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ::: I I I I I I I I I I I I I I I I I I I 
orf 133-1 PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 



orfl33ng-l.pep GENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNI 
orf 133-1 GENSPTYKKHCNRSCGIYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNI 



orfl33ng-l.pep QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHN 
Orfl33-l QEMYFSQIGDSGVHTALKPERANTWQFGFNT YKKGLLKQDDTLGLKLVGYRSRIDNYIHN 



orf 133ng-l . pep VYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
orf 133-1 VYGECWWDLNGDIPSWVSSTGLAYTIQHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 



orf 133ng-l . pep STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 
orf 133-1 STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 



orf 133ng-l . pep FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 

I ! I M I I I I I ! I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 133-1 FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGT3K3 



orfl33ng-l .pep VLTNFARGRTFLMTMSYKFX 
orfl33-l VLTNFARGRTFLMTMSYKFX 



70 In addition, ORF133ng-l is homologous to a TonB-dependent receptor in H.influenzae: 
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sp|P45114 |YC17_HAEIN PROBABLE TONB-DEPENDENT RECEPTOR HI1217 PRECURSOR 
>gi 1 1075372 Ipir I IG64110 transferrin binding protein 1 precursor (tbpl) homolog - 
Haemophilus influenzae (strain Rd KW20) >gi 1 1574147 (U32801) transferrin binding 
protein 1 precursor (tbpl) [Haemophilus influenzae] Length = 913 
Score = 930 bits (2377), Expect = 0.0 

Identities = 476/921 (51%), Positives = 619/921 (56%), Gaps = 72/921 (7%) 

Query: 38 QVLEDVHVJCAKRVPKDKKVFTDARAVSTRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIV 97 

+ L + V K + DKK FT+A+A STR++VFK + +D ++RSIPGAFTQQDK SG+V 
Sbjct: 29 ETLGQIDWEKVISNDKKPFTEAKAKSTRENVFKETQTIDQVIRSIPGAFTQQDKGSGW 88 

Query: 98 SLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFS 157 

S+NIRG++G GRVNTMVDG+TQTFYST+ D+G++GGSSQFGA++D NFIAG+DV K +FS 
Sbjct: 89 SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAIDPNFIAGVDVNKSNFS 148 

Query: 158 GSAGINSLAGSANLRTLGVDDVVQXXXXXXXXXXXXXXXXXXXXXAMAAIGARKWLESGA 217 

G++GIN+LAGSAN RTLGV+DV+ M RKWL++G 

Sbjct: 149 GASGINALAGSANFRTLGVNDVITDDKPFGIILKGMTGSNATKSNFMTMAAGRKWLDNGG 208 

Query: 218 SVGVLYGHSRRGVAQNYRVGGGGQHIGNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERD 277 

VGV+YG+S+R V+Q+YR+ GGG+ + + G++ L + K+ YF + G N G+W D 
Sbjct: 209 YVGWYGYSQREVSQDYRI-GGGERLASLGQDILAKEKEAYF-RNAGYILNP-EGQWTPD 265 

Query: 278 LQRQYWK TKWY KKYEDPQELQK YIEE 303 

L +++W +Y KK +D ++LQK lEE 

Sbjct: 266 LSKKHWSCNKPDYQKNGDCSYYRIGSAAKTRREILQELLTNGKKPKDIEKLQKGNDGIEE 325 

Query: 304 HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 363 

DKS+ N QY + PI+P L+ +S +L K EY AQ R L+ +IGSRKI 

Sbjct: 326 TDKSFERN-KDQYSVAPIEPGSLQSRSRSHLLKFEYGDDHQNLGAQLRTLDNKIGSRKIE 384 

Query: 364 NRNYQFNYGL3LNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 423 

NRNYQ NY + N Y +LNL AA+N G+ YPKG F GW + T N A I+D+NN+ 

Sbjct: 38 5 NRNYQVNYNFNNN3YLDLNLMAAHNIGKTIYPKGGFFAGWQVADKLITKNVANIVDINNS 4 44 

Query: 424 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSY— LGRFKGDKG 4 81 

TF LP+E +L+TTLGFNYF NEY KNRFPEEL LF++ D GLYS+ GR+ G K 
Sbjct: 445 HTFLLPKEIDLKTTLGFNYFTNEYSKNRFPEEL3LFYNDASHDQGLYSHSKRGRYSGTKS 504 

Query: 482 LLPQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKR 541 

LLPQ+S I+QP+G Q F T YFD AL K lY LNYS N +Y F GEY GY 
Sbjct: 505 LLPQRSVILQPSGKQKFKTVYFDTALSKGI YHLNYSVNFTHYAFNGEYVGY 555 

Query: 54 2 AFGENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMP 601 

EN+ + + EP+L K G K+A NHS ++3A+ DYFMPF YSRTHRMP 

Sbjct: 556 ENTAGQQ INEPILHKSGHKKAFNHSATL3AELSDYFMPFFTYSRTHRMP 604 

Query: 602 NIQEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYI 661 

NXQEM+FSQ+ ++GV+TALKPE+++T+Q GFNTYKKGL QDD+LG+KLVGYRS I NYI 
Sbjct: 605 NIQEMFFSQVSNAGVNTALKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYI 664 

Query: 662 HNVYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAY 721 

HNVYG WW +P+W S G YTI H+N+K V K G ELE+NYD GRFF N+SYAY 

Sbjct: 665 HNVYGVWW— RDGMPTWAESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAY 722 

Query: 722 QKSTQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGA 781 

Q++ QPTN++DAS PNNAS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A 
Sbjct: 723 QRTNQPTNYADASPRPNNASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLA 782 

Query: 782 MRYFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKN 841 

RY+GKS RAT EE YI+G+ + +R+ ++K+TE + +QP+I D + +YEP K+ 

Sbjct: 783 ARYYGKSKRATIEEEYINGSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKD 841 

Query: 842 LIFRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTS 901 

LI +AEV+NL D+RY+DPLDAGNDAA+QRYYSS + + C D + C GG+ 
Sbjct: 842 LIIKAEVQNLLDKRYVDPLDAGNDAASQRYYSSL NNSIECAQDSSAC GGSD 892 

Query: 902 KSVLTNFARGRTFLMTMSYKF 922 

K+VL NFARGRT++++++YKF 
Sbjct: 893 KTVLYNFARGRTYILSLNYKF 913 



wo 99/24578 



-483- 



PCT/IB98/01665 



The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 104 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 885> 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

4 51 AAAGAAAAAA ACAGCGTGAT CAATGTGCGC GAAATGTTGC CCGACCAT . . 

This corresponds to the amino acid sequence <SEQ ID 886; ORFl 12>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSVINVR EMLPDH . . . 

Further work revealed further partal nucleotide sequence <SEQ ID 887>: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 gGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

4 01 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCrTkAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAACT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTsTCGGA TTGCTGTTCC ACCTTGCCGG 

951 ACGGCTCTTT GGGTTTACCA GCCAACTCGG . . . 

This corresponds to the amino acid sequence <SEQ ID 888; ORFl 12-1>: 

1 MNLISRYIIR QMAVHAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW RK LVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICXG LLFHLA GRLF GFTSQL... 

Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 



following results: 
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Homologv with a predicted ORF from N.meninsitidis (strain A) 

ORFl 12 shows 96.4% identity over a 166aa overlap with an ORF (ORFl 12a) from strain A oiN. 
meningitidis: 



orfll2.pep 
orfll2a 



MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALfCMPAR 
MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 



orfll2.pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
or f 112a AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 



)rfll2.pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 

)rfll2a VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 



orfll2a ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 
190 200 210 220 230 240 

The ORFl 12a nucleotide sequence <SEQ ID 889> is: 





ATGAACCTGA 


TTTCACGTTA 


51 


TTACGCGCTC 


CTTGCCTTCC 


101 


ACGAAACCGG 


CAACCTCGGC 


151 


GGNTACACCG 


CCCTCAAAAT 


201 


CGCCGTCCTT 


ATCGGCGGAC 


251 


GCGAACTGAN 


CGTCATCAAA 


301 


TTGATTCTGT 


CGCAGTTCGG 


351 


CGGCGAATGG 


GTTGCGCCCA 


401 


CCGCGGCCAT 


CAACGGCAAA 


451 


AAAGARAAAA 


ACAGCATTAT 


501 


CCTGCTGGGC 


ATTAAAATCT 


551 


AGGCAGTGGA 


AGCCGATTCC 


601 


TTGAAAAACA 


TCCGCCGCAG 


651 


TATTGCGGCT 


GAAGAAAANT 


701 


ACGTATTGCT 


CGTCAAACCC 


751 


TACATCCGCC 


ACCTCCAAAN 


801 


CGCATGGTGG 


CGCAAATTGG 


851 


TCGTCGCCTT 


TGCCTTTACC 


901 


TTAAAANTCT 


TCGGCGGCAT 


951 


NCGGCTCTTC 


NGGTTTACCA 


1001 


NCGGCGCACT 


ACCTACCATA 


1051 


CGCAAACAGG 


AAAAACGCTA 


This encodes a 


protein having the amino i 



CATCATCCGT 
TCGCTTTGTA 
AAAGGCAGTT 
GNCCGCCCGC 
TGGTCTCTNT 
GCCAGCGGCA 
TTTTATTTTT 
CACTGAGCCA 
ATCAGTACCG 
CAATGTGCGC 
GGGCCCGCAA 
GCCGTTTTGA 
CACGCTTGGC 
GGCCGATTTC 
GACCAAATGT 
NNACAGCCAA 
TTTACCCCGC 
CCGCAAACCA 
CTGTCTCGGA 
GCCAACTCTA 
GCCTTCGCCT 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCA 
AAAAGCCGAA 
GCAATACCGG 
GAAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
GAAGACAAAG 
CGTCAAACGC 
CCGTCGGCGA 
AACACCCGAA 
CGCAGCCTGG 
CCCGCCACGG 
TTGCTGTTCC 
CGGCATCCCG 
TGCTCGCCGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGNTG 
TGATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAAG 
CCTTTGGCTG 
CCGACCATAC 
GAACTGGCAG 
CAGTTGGCAG 
TCGAGGTCTC 
AACCTGATGG 
ACTGACCACC 
TCTACGCCAT 
GTGATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEMX 

51 GYTALKMXAR A YE LMPLAVL IGGLVSXSQ L AAGSELXVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSIINVR EMLPDHTLLG IKIWARNDPCN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EEXWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQXXSQ NTRIYAIAWW RK LVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKXFGGICLG LLFHL AGRLF XFTSQLYGIP PFLXGALPTI AFALLAVWLI 

351 RKQEKR- 

ORFl 12a and ORFl 12-1 show 96.3% identity in 326 aa overlap: 



MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I M I I I I I I I I I I I I I I I I I II 

MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 



orfll2a.pep 



AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
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orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
orf 112a . pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 
orf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 
orf 112a . pep ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 
orf 112-1 ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 
DQMSVGELTTYIRHLQXXSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 
DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 



orfll2a.pep 



orf 112a . pep LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKQEKRX 

II I I I I I I I I I I I I I I I I I I I I I 
orf 112-1 LKLFGGICXGLLFHLAGRLFGFTSQL 

Homology with a predicted ORF from N.sonorrhoeae 

ORF 1 12 shows 95.8% identity over 166aa overlap with a predicted ORF (0RF112ng) from N. 
gonorrhoeae: 

orf 112 . pep MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 

orf 112ng MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 

orf 112 .pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 120 

orfll2ng AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 120 

orfll2.pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 166 

orfll2ng VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 18 0 

The complete length ORFl 12ng nucleotide sequence <SEQ ID 891 > is: 



ATGAACCTGA 
TTACGCGCTC 
ACGAAACCGG 
GGCTACACCG 
CGCCGTCCTC 
GCGAACTGGC 
TTGATTCTGT 
CGGCGAATGG 
cCGCCGCCAt 
AAAGAAAAAa 
GCTTTTGGGC 
AGGCAGTGGA 
TTGAAAAACA 
cgCCGCCGCC 
ACGTATTGCT 
TACATCCGCC 
CGCATGGTGG 
TCGTTGCCTT 
TTAAAACTCT 
CAGGCTCTTC 
CCGGCGCACT 
CGCAAACAGG 



TTTCACGTTA 
CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
CGTCATCAAA 
CTCAGTTCGG 
GTTGCGCCCA 
taacggCAAA 
ccAGCATTAT 
ATCAAAATTT 
AGCCGATTCC 
TCCGCCGCAG 
GAAGAAACTT 
CGTCAAGCCC 
ACCTCCAAAA 
CGTAAACTCG 
CGCCTTTACG 
TCGGCGGCAT 
GGGTTTACCA 
GCCTACCATA 
AAAAACGTTG 



CATCATCCGC 
TCGCTTTGTA 
AAAGGCAGTT 
GCCCGCCCGC 
TGGCCTCTCT 
GCCAGCGGCA 
TTTTATTTTT 
CGCTGAGCCA 
ATCAGCAccg 
CAATGTGcGc 
GGGCGCGCAA 
GCCGTTTTGA 
CATCATGGGT 
gGCCGATTGC 
GACCAAATGT 
CAACAGCCAA 
TTTACCCCGT 
CCGCAAACCA 
CTGTCTCGGA 
GCCAACTCTA 
GCCTTCGCCT 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCG 
AAAAGCCGAA 
gcAATACCGG 
GGAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
ACAGACAAAA 
CGTCAGACGC 
CCGTCGGCGA 
AACACCCAAA 
CGCCGCATGG 
CGCGCCACGG 
TTGCTGTTCC 
CGGCACCCCA 
TGCTCGCTGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGCTG 
TCATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAag 
CCTTTggcTG 
CCGACCATAC 
GAATTGGCAG 
CAGCTGGCAG 
TCGAAACATC 
AACCTGATGG 
GCTGACCACC 
TCTACGCCAT 
GTCATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



This encodes a protein having amino acid sequence <SEQ ID 892>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LMPLAVL IGGLASLSQL AAGSELAVIK ASGMSTKKLL 

101 LILSQFGFIF AIAAVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKTSIINVR GMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSIMG TDKIETSAAA EETWPIAVRR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTQIYAIAWW RK LVYPVAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICLG LLFHLA GRLF GFTSQLYGTP PFL AGALPTI AFALLAVWLI 
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351 RKQEKR* 

ORFl 12ng and ORFl 12-1 show 94.2% identity in 326 aa overlap: 



MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 
MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 



AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 
AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 



VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 
VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 



ELAEAVEADSAVLNSDGSWQLKNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 
ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 



DQMSVGELTTYIRHLQNNSQNTQIYAIAWWRKLVYPVAAWVMALVAFAFTPQTTRHGNMG 
DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 



LKLFGGICLGLLFHLAGRLFGFTSQLYGTPPFLAGALPTIAFALLAVWLIRKQEKRX 
LKLFGGICXGLLFHLAGRLFGFTSQL 



This analysis suggests that these proteins from N. meningitidis and N. gonorrhoeae, and their 
40 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



It will be appreciated that the invention has been described by means of example only, and that 
modifications may be made whilst remaining within the spirit and scope of the invention. 
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TABLE I - PGR primers 



ORT 


Primer 


Sequence 


Restriction sites 


ORFl 


Forward 
Reverse 


CGCGGATCCGCTAGC-GGACACACTTATTTCGG 
CCCGCTCGAG-CCAGCGGTAGCCTAATT 


BamHI-Nhel 
Xliol 


ORF2 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG 
CCCGCTCGAG-GACGGCATAACGGCG 


BamHI-Ndel 
Xhol 


ORF 2-1 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG 
CCCGCTCGAG-TGATTTACGGACGCGCA 


BamHI-Ndel 
Xhol 


ORF4 


Forward 
Reverse 


GCGGATCCCATATG-TGCGGAGGTCAAAAAGAC 
CCC6CTCGAG-TTTGGCTGCGCCTTC 


BamHI-Ndel 
Xhol 


ORF 5 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGGAAGGCGCACAACC 

CGGGATCC-ATGGAAGGCGCACAAC 

CCCGCTCGAG-GACTGTGCAAAAACGG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 6 


Forward 
Reverse 


CGCGGATCCCATATG-ACCCGTCAATCTCTGCA 
CCCGCTCGAG-TGCGCCGAACACTTTC 


BamHI-Ndel 
Xhol 


ORF? 


Forward 
Reverse 


CGCGGATCCGCTAGC-GCGCTGCTTTTTGTTCC 
CCCGCTCGAG-TTTCAAAATATATTTGCGGA 


BamHI-Miel 
Xhol 


ORF 8 


Forward 
Reverse 


GCGGATCCCATATG-GCTCAACTGCTTCGTAC 
CCCGCTCGAG-AGCAGGCTTTGGCGC 


BamHI-Ndel 
Xhol 


ORF 9 


Forward 
Reverse 


CGCGGATCCCATATG-CCGAAGGAAGTCGGAAA 
CCCGCTCGAG-TTTCCGAGGTTTTCGGG 


BamHI-Ndel 
Xhol 


ORF 10 


Forward 
Reverse 


GCGGATCCCATATG-GACACAAAAGAAATCCTC 
CCCGCTCGAG- TAATGGGAAACCTTGTTTT 


BamHI-Ndel 
Xhol 


ORF 11 


Forward 
Reverse 


GCGGATCCCATATG-GCGGTCAACCTCTACG 
CCCGCTCGAG-GGAAACGACTTCGCC 


BamHI-Ndel 
Xhol 


ORF 13 


Forward 
Reverse 


CGCGGATCCCATATG-GCTCTGCTTTCCGCGC 
CCCGCTCGAG-AGGGTGTGTGATAATAAG 


BamHI-Ndel 
Xhol 


ORF 15 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-GCGGGACACTGACAG 

CGGGATCC-TGCGGGACACTGACAGG 

CCCGCTCGAG-AGGTTGGCCTTGTCTATG 


Ndel-NcoI 

BamHI 

Xhol 


ORF 17 


Forward 


GGAATTCCATATGGCCATGG -TTGCCGGCCTGTTCG 


Ndel-NcoI 
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Forward 
Reverse 


CGGGATCC-ATTGCCGGCCTGTTCG 
CCCGCTCGAG-AAGCAGGTTGTACAGC 


BamHI 
Xhol 


ORF18 


Forward 
Reverse 


GCGGATCCCATATG-ATTTTGCTGCATTTGGAT 
CCCGCTCGAG-TCTTCCAATTTCTGAAAGC 


BamHI-Ndel 
Xhol 


ORF19 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TCGCCAGTGTTTTTACC 

CGGGATCC-TTCGCCAGTGTTTTTACCG 

CCCGCTCGAG-GGTGTTTTTGAAGCTGCC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 20 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TCGGCGCGGGTATG 

CGGGATCC-TTCGGCGCGGGTATG 

CCCGCTCGAG-CGGCGAGCGAGAGCA 


Ndel-NcoI 

BamHI 

Xhol 


ORF 22 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGATTAAAATCAAAAAAGGTCT 

CGGGATCC-ATGATTAAAATCAAAAAAGGTCTAAACC 

CCCGCTCGAG-ATTATGATAGCG6CCC 


Ndel-NcoI 

BamHI 

Xhol 


ORF 23 


Forward 
Reverse 


CGCGGATCCCATATG-GATGTTTCTGTTTCAGAC 
CCC GCTCGAG- T T TAAACCGAT AGGT AAAC G 


BamHI-Ndel 
Xhol 


ORF 24 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGATGCCGGAAATGGTG 

CGGGATCC-ATGATGCCGGAAATGGTG 

CCCGCTCGAG-TGTCAGCGTGGCGCA 


Ndel-NcoI 

BamHI 

Xhol 


ORF 25 


Forward 
Reverse 


GCGGATCCCATATG-TATCGCAAACTGATTGC 
CCCGCTCGA6-ATCGATGGAATAGCCG 


BamHI-Ndel 
Xhol 


ORF 26 


Forward 
Reverse 


GCGGATCCCATATG -CAGCTGATCGACTATTC 
CCCGCTCGAG-GACATCGGCGCGTTTT 


BamHI-Ndel 
Xhol 


ORF 27 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-AGACCTATTCTGTT TA 
CGGGATCC- CAGACCTATTCTGTTTATTTTAATC 
CCCGCTCGAG- G GGT T C GAT TAAAT AACCAT 


Ndel-NcoI 

BamHI 

Xhol 


ORF 28 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-ACGGCTGTACGTTGATGT 

CGGGATCC-AACGGCTGTACGTTGATG 

CCCGCTCGAG-TTTGTCAGAGGAATTCGCG 


Ndel-NcoI 

BamHI 

Xhol 


ORF 29 


Forward 
Forward 


GCGGATCCCATATG -AACGGTTTGGATGCCCG 
CGCGGATCCGCTAGC-AACGGTTTGGATGCCCG 
CCCGCTCGAG-TTTGTCTAAGTTCCTGATATG 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 32 


Forward 
Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG 
CCCGCTCGAG-GCGTATTTTTTGATGCTTTG 


BamHI-Ndel 
Xhol 


ORF 33 


Forward 
Reverse 


GCGGAT CCCAT ATG - AT T GATAGGGAT CGTAT G 
CCCGCTCGAG-TTGATCTTTCAAACGGCC 


BamHI-Ndel 
Xhol 
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ORF 35 


Forward 
Forward 
Reverse 


GCGGATCCCATATG-TTCAGAGCTCAGCTT 
CGCGGATCCGCTAGC-TTCAGAGCTCAGCTT 
CCCGCTCGAG-AAACAGCCATTTGAGCGA 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 37 


Forward 
Reverse 


GCGGATCCCATATG-GATGACGTATCGGATTTT 
CCCGCTCGAG-ATAGCCCGCTTTCAGG 


BamHI-Ndel 
Xhol 


ORF 58 


Forward 
Reverse 


CGCGGATCCGCTAGC-TCCGAACGCGAGTGGAT 
CCCGCTCGAG-AGCATTGTCCAAGGGGAC 


BamHI-Nhel 
Xhol 


ORF 65 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGCTGTATCTGAATCAAG 

CGGGATCC-TTGCTGTATCTGAATCAAGG 
CCCGCTCGAG-CCGCATCG6CAGACA 


Ndel-Ncol 

BamHI 

Xhol 


ORF 66 


Forward 
Reverse 


GCGGATCCCATATG-TACGCATTTACCGCCG 
CCCGCTCGAG-TGGATTTTGCAGAGATGG 


BamHI-Ndel 
Xhol 


ORF 72 


Forward 
Reverse 


CGCGGATCCCATATG- AATGCAGTAAAAATATCTGA 
CCCGCTCGAG-GCCTGAGACCTTTGCAA 


BamHI-Ndel 
Xhol 


ORF 73 


Forward 
Reverse 


GCGGATCCCATATG-AGATTTTTCGGTATCGG 
CCCGCTCGAG-TTCATCTTTTTCATGTTCG 


BamHI-Ndel 
Xhol 


ORF 75 


Forward 
Reverse 


GCGGATCCCATATG- TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 


ORF 76 


Forward 
Reverse 


GATCAGCTAGCCATATG-AAACAGAAAAAAACCGC 
CGGGATCC-TTACGGTTTGACACCGTT 


Nhel-Ndel 
BamHI 


ORF 79 


Forward 
Reverse 


CGCGGATCCCATATG-GTTTCCGCCGCCG 
CCCGCTCGAG-GTGCTGATGCGCTTCG 


BamHI-Ndel 
Xhol 


ORF 83 


Forward 
Reverse 


GCGGATCCCATATG-AAAACCCTGCTGCTGC 
CCCGCTCGAG-GCCGCCTTTGCGGC 


BamHI-Ndel 
Xhol 


ORF 84 


Forward 
Reverse 


GCGGATCCCATATG-GCAGAGATCTGTTTG 
CCCGCTCGAG-GTTTGCCGATCCGACCA 


BamHI-Ndel 
Xhol 


ORF 85 


Forward 
Reverse 


CGCGGATCCCATATG- GCGGTTTGGGGCGGA 
CCCGCTCGAG-TCGGCGCGGCGGGC 


BamHI-Ndel 
Xhol 


ORF 89 


Forward 
Forward 
Reverse 


GG AAT T CCATATGGCCATGG- CCAT ACC T T CT T ATC A 
CGGGATCC-GCCATACCTTCTTATCAGAG 
CCCGCTCGAG- TTTT TTGCGATTAGAAAAAGC 


Ndel-NcoI 

BamHI 

Xhol 


ORF 97 


Forward 


GCGGATCCCATATG-CATCCTGCCAGCGAAC 


BamHI-Ndel 
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Reverse 


CCCGCTCGAG-TTCGCCTACGGTTTTTTG 


Xhol 


ORF 98 


Forward 
Reverse 


GCGGATCCCATATG-ACGGT^ACTGCGG 
CCCGCTCGAG-TTGTTGTTCGGGCAAATC 


BamHI-Ndel 
Xhol 


ORFIOO 


Forward 
Reverse 


GCGGATCCCATATG-TCGGGCATTTACACCG 
CCCGCTCGAG-ACGGGTTTCGGCGGAA 


BamHI-Ndel 
Xhol 


ORF 101 


Forward 
Reverse 


GCGGATCCCATATG-ATTTATCAAAGAAACCTC 
CCCGCTCGAG-TTTTCCGCCTTTCAATGT 


BamHI-Ndel 
Xhol 


ORF 102 


Forward 
Reverse 


GCGGATCCCATATG-GCAGGGCTGTTTTACC 
CCCGCTCGAG-AAACGGTTTGAACACGAC 


BamHI-Ndel 
Xhol 


ORF 103 


Forward 
Reverse 


GCGGATCCCATATG-AACCACGACATCAC 
CCCGCTCGAG-CAGCCACAGGACGGC 


BamHI-Ndel 
Xhol 


ORF 104 


Forward 
Reverse 


GCGGATCCCATATG-ACGTGGGGAACGC 
CCCGCTCGAG-GCGGCGTTTGAACGGC 


BamHI-Ndel 
Xhol 


ORF 105 


Forward 
Reverse 


GCGGATCCCATATG-ACCAAATTTCAAACCCCTC 
CCCGCTCGAG-TAAACGAATGCCGTCCAG 


BamHI-Ndel 
Xhol 


ORF 106 


Forward 
Reverse 


GCGGATCCCATATG-AGGATAACCGACGGCG 
CCCGCTCGAG-TTTGTTCCCGATGATGTT 


BamHI-Ndel 
Xhol 


ORF 109 


Forward 
Reverse 


GCGGATCCCATATG-GAAGATTTATATATAATACTCG 
CCCGCTCGAG-ATCAGCTTCGAACCGAAG 


BamHI-Ndel 


ORFllO 


Forward 
Reverse 


AAAGAATTC-ATGAGTAAATCCCGTAGATCTCCC 
AAACTGCAG-GGAAAACCACATCCGCACTCTGCC 


EcoRI 
PstI 


ORFlll 


Forward 
Reverse 


AAAGAATTC-GCACCGCAAAAGGCAAAAACCGCA 
AAACTGCAG-TCTGCGCGTTTTCGGGCAGGGTGG 


EcoRI 
PstI 


ORF113 


Forward 
Reverse 


AAAGAATTC-ATGAACAAAACCCTCTATCGTGTGATTTTCAACCG 
AAACTGCAG-TTACGAATGCCTGCTTGCTCGACCGTACTG 


EcoRI 
PstI 


ORF115 


Forward 
Reverse 


AAAGAATTC-TTGCTTGTGCAAACAGAAAAAGACGG 
AAAAMGTCGAC-CTATTTTTTAGGGGC HTTGC ITGTTTGAAAAGCCTGCC 


EcoRI 
Sail 


ORF119 


Forward 
Reverse 


AAAGAATTC-TACAACATGTATCAGGAAAACCAATACCG 
AAACTGCAG-TTATGAAAACAGGCGCAGGGCGGTTTTGCC 


EcoRI 
PstI 


ORF120 


Forward 
Reverse 


AAAGAATTC-GCAAGGCTACCCCAATCCGCCGTG 
AAACTGCAG-CGGTTTGGCTGCCTGGCCGTTGAT 


EcoRI 
PstI 


ORF121 


Forward 
1 Reverse 


AAAGAATTC-GCCTTGGTCTGGCTGGTTTTCGC 
AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC 


EcoRI 
PstI 
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ORF122 


Forward 
Reverse 


AAAAAAGTCGAC-ATGTCTTACCGCGCAAGCAGTTCrCC 
AAACTGCAG-TCAGGAACACAAACGATGACGAATATCCGTATC 


Sail 
PstI 


ORF125 


Forward 
Reverse 


AAAGAATTC-GCGCTGTTTTTTGCGGCGGCGTAT 
AAACTGCAG-CGCCGTTTCAAGACGAAAAAGTCG 


EcoRI 
PstI 


ORF126 


Forward 
Reverse 


AAAGAATTC-GCGGAAACGGTCGAAG 
AAACTGCAG-TTAATCTTGTCTTCCGATATAC 


EcoRI 
PstI 


ORF127 


Forward 
Reverse 


AAAGAATTC-ATGACTGATAATCGGGGGTTTACG 
AAAAAAGTCGAC-CTTAAGTAACTTGCAGTCCTTATC 


EcoRI 
Sail 


ORF128 


Forward 
Reverse 


AAAGAATTC-ATGCAAGCTGTCCGCTACAGGCC 

AAACTGCAG- CTA rrCCAATGCGCCGCC GCGGGAATG ITTGAGCAGGCG 


EcoRI 
PstI 


ORF129 


Forward 
Reverse 


AAAGAATTC-ATGGATTTTCGTTTTGACATTATTTACGAATACCG 
AAACTGCAG-TTATTTTTTGATGAAATTTTGGGGCGG 


EcoRI 
PstI 


ORF130 


Forward 
Reverse 


AAAGAATTC-GCAGTACTTGCCATTCTCGGTGCG 
AAACTGCAG-CTCCGGATCGTCTGTAAACGCATT 


EcoRI 
PstI 


ORF 131 


Forward 
Reverse 


GCGGATCCCATATG-GAAATTCGGGCAATAAAAT 
CCCGCTCGAG-CCAGCGGACGCGTTC 


BamHI-Ndel 
Xhol 


ORF 132 


Forward 
Reverse 


GCGGATCCCATATG-AAAGAAGCGGGGTTTG 
CCCGCTCGAG-CCAATCTGCCAGCCGT 


BamHI-Ndel 
Xhol 


ORF 133 


Forward 
Reverse 


CGCGGATCCCATATG-GAAGATGCAGGGCGCG 
CCCGCTCGAG-AAACTTGTAGCTCATCGT 


BamHI-Ndel 
Xhol 


ORF 134 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTGCAAGCAGTATTG 
CCCGCTCGAG-ATCCTGTGCCAATGCG 


BamHI-Ndel 
Xhol 


ORF 135 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAAAAGCTTT 
CCCGCTCGAG-AAATACCGCTGAGGATG 


BamHI-Ndel 
Xhol 


ORF 136 


Forward 
Reverse 


CGCGGATCCGCTAGC-ATGAAGCGGCGTATAGCC 
CCCGCTCGAG-TTCCGAATATTTGGAACTTTT 


BamHI-Nhel 
Xhol 


ORF 137 


Forward 
Reverse 


CGCGGATCCCATATG-GGCACGGCGGGAAATA 
CCCGCTCGAG-ATAACGGTATGCCGCC 


BamHI-Ndel 


ORF 138 


Forward 
Reverse 


GCGGATCCCATATG-TTTCGTTTACAATTCAGGC 
CCCGCTCGAG-CGGCGTTTTATAGCGG 


BamHI-Ndel 
Xhol 


ORF 139 


Forward 
Reverse 


GCGGATCCCATATG-GCTTTTTTGGCGGTAATG 
CCCGCTCGAG-TAACGTTTCCGTGCGTTT 


BamHI-Ndel 
Xhol 
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ORF140 


Forward 
Reverse 


GCGGATCCCATATG-TTGCCCACAGGCAGC 
CCCGCTCGAG-GACGATGGCAAACAGC 


BamHI-Ndel 
Xhol 


ORF141 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAGCAGTCT 
CCCGCTCGAG-ATCTGTTGTTTTTAAAATATT 


BamHI-Ndel 
Xhol 


ORF142 


Forward 
Reverse 


GCGGATCCCATATG-GATAATTCTGGTAGTGAAG 
CCCGCTCGAG-AAACGTATAGCCTACCT 


BamHI-Ndel 
Xhol 


ORF143 


Forward 
Reverse 


GCGGATCCCATATG-GATACCGCTTTGAACCT 
CCCGCTCGAG-AATGGCTTCCGCAATATG 


BamHI-Ndel 
Xhol 


ORF144 


Forward 
Reverse 


GCGGATCCCATATG-ACCTTTTTACAACGTTTGC 
CCCGCTCGAG-AGATTGTTGTTGTTTTTTCG 


BamHI-Ndel 
Xhol 


ORri47 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 



NB: 

- restriction sites are underlined 



- for ORFs 110-130, where the ORF itself carries an £coiei site (eg. ORF122), a&/I site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site {eg. 
5 ORFs 1 1 5 and 1 27), a Sail site was used in the reverse primer. 
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TABLE II - Summary of cloning, expression and purification 



ORF 


PCR/cIoning 


His-fusion 
expression 


GST-fusion 
expression 


Purification 


orf 1 


+ 


+ 


+ 


His-fiision 


orf2 


+ 


+ 


+ 


GST-fiision 


orf 2.1 


+ 


n.d. 


+ 


GST-fiision 


orf 4 


+ 


+ 


+ 


His-fusion 


orf 5 


+ 


n.d. 


+ 


GST-fiision 


orf 6 


+ 


+ 


+ 


GST-fiision 


orf? 


+ 


+ 


+ 


GST-fiision 


orf 8 


+ 


n.d. 


n.d. 




orf 9 


+ 


+ 


+ 


GST-fiision 


orf 10 


+ 


n.d. 


n.d. 




orf 11 


+ 


n.d. 


n.d. 




orf 13 


+ 


n.d. 


+ 


GST-fiision 


orf 15 


+ 


+ 


+ 


GST-fiision 


orf 17 


+ 


n.d. 


n.d. 




orf 18 


+ 


n.d. 


n.d. 




orf 19 


+ 


n.d. 


n.d. 




orf 20 


+ 


n.d. 


n.d. 




orf 22 


+ 


+ 


+ 


GST-fiision 


orf 23 


+ 


+ 


+ 


His-fiision 


orf 24 


+ 


n.d. 


n.d. 




orf 25 


+ 


+ 


+ 


His-fiision 


orf 26 


+ 


n.d. 


n.d. 




orf 27 


+ 


+ 


+ 


GST-fiision 


orf 28 


+ 


+ 


+ 


GST-fiision 


orf 29 


+ 


n.d. 


n.d. 




orf 32 


+ 


+ 


+ 


His-fiision 


orf 33 


+ 


n.d. 


n.d. 




orf 35 


+ 


n.d. 


n.d. 




orf 37 


+ 


+ 


+ 


GST-fiision 


orf 58 


+ 


n.d. 


n.d. 




orf 65 


+ 


n.d. 


n.d. 




orf 66 


+ 


n.d. 


n.d. 




orf 72 


+ 


+ 


n.d. 


His-fiision 


orf 73 


+ 


n.d. 


+ 


n.d. 


orf 75 


+ 


n.d. 


n.d. 




orf 76 


+ 


+ 


n.d. 


His-fiision 


orf 79 


+ 


+ 


n.d. 


His-fiision 


orf 83 


+ 


n.d. 


+ 


n.d. 


orf 84 


+ 


n.d. 


n.d. 
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orf85 


+ 


n.d. 


+ 


GST-fusion 


orf89 


+ 


n.d. 


+ 


GST-fosion 


orf97 


+ 


+ 


+ 


GST-fusion 


orf98 


+ 


n.d. 


n.d. 




orflOO 


+ 


n.d. 


n.d. 




orflOl 


+ 


n.d. 


n.d. 




orfl02 


+ 


n.d. 


n.d. 




orfl03 


+ 


n.d. 


n.d. 




orfl04 


+ 


n.d. 


n.d. 




orflOS 


+ 


n.d. 


n.d. 




orf 106 


-h 


+ 


+ 


His-fusion 


orfl09 


+ 


n.d. 


n.d. 




orf 110 


+ 


n.d. 


n.d. 




orf 111 


+ 


+ 


n.d. 


His-fiision 


orf 113 


+ 


+ 


n.d. 


His-fusion 


orf 115 


n.d. 


n.d. 


n.d. 




orf 119 


+ 


+ 


n.d. 


His-fusion 


orf 120 


+ 


+ 


n.d. 


His-fusion 


orf 121 


+ 


n.d. 


n.d. 




orf 122 


+ 


+ 


n.d. 


His-fusion 


orf 125 


+ 


+ 


n.d. 


His-fiasion 


orf 126 


+ 


+ 


n.d. 


His-fusion 


orf 127 


+ 


+ 


n.d. 


His-fusion 


orf 128 


+ 


n.d. 


n.d. 




orf 129 


+ 


+ 


n.d. 


His-fusion 


orf 130 


+ 


ii.d. 


n.d. 




orf 131 


+ 


+ 


+ 


n.d. 


orf 132 


+ 


+ 


+ 


His-fusion 


orf 133 


+ 


n.d. 


+ 


GST-fusion 


orf 134 


+ 


n.d. 


n.d. 




orf 135 


+ 


n.d. 


n.d. 




orf 136 


+ 


n.d. 


n.d. 




orf 137 


+ 


n.d. 


+ 


GST-fusion 


orf 138 


+ 


n.d. 


+ 


GST-fiision 


orf 139 


+ 


n.d. 


n.d. 




orf 140 


+ 


n.d. 


n.d. 




orf 141 


+ 


n.d. 


n.d. 




orf 142 


+ 


n.d. 


n.d. 




orf 143 


+ 


n.d. 


n.d. 




orf 144 




n.d. 


+ 


n.d. 


orf 147 


+ 


n.d. 


n.d. 
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CLAIMS 

1 . A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, and 8. 

2. A nucleic acid molecule which encodes a protein according to claim 1 . 

5 3. A nucleic acid molecule according to claim 2, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs I, 3, 5, and 7. 

4. A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 
54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 

10 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 
144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 
184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 
224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 
264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 

15 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 
344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 
384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 
424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 456, 458, 460, 462, 
464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 496, 498, 500, 502, 

20 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 
544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580, 582, 
584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 616, 618, 620, 622, 
624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 
664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 

25 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 
744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 776, 778, 780, 782, 
784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 
824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 
864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 



30 5 . A protein having 5 0% or greater sequence identity to a protein according to claim 4 . 
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6. A protein comprising a fragment of an amino acid sequence selected from the group 
consisting of SEQ IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 
44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 
96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 

5 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 
176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 
216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 
256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 
296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 

10 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 
376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 
416, 418, 420, 422, 424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 
496, 498, 500, 502, 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 

15 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 
576, 578, 580, 582, 584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 
616, 618, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 
656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 
696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 

20 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 
776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 
816, 818, 820, 822, 824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 
856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

7. An antibody which binds to a protein according to any one of claims 4 to 6. 

25 8. A nucleic acid molecule which encodes a protein according to any one of claims 4 to 6. 

9. A nucleic acid molecule according to claim 8, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 
37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 
89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 
30 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 
171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 
211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 
251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 
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291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 
331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 
371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 
411, 413, 415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 
5 451, 453, 455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 
491, 493, 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 
531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 
571, 573, 575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 
611, 613, 615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 

10 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 
691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 
731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 
771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 
811, 813, 815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 

15 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 
&891.. 

10. A nucleic acid molecule comprising a fragment of a nucleotide sequence selected from the 
group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 
41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 

20 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 
135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 
175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 
215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 
255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 

25 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 
335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 
375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 411, 413, 
415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 451, 453, 
455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 491, 493, 

30 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 
535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 571, 573, 
575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 611, 613, 
615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 
655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 

35 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 
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735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 
775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813, 
815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 
855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, & 891.. 

5 11. A nucleic acid molecule comprising a nucleotide sequence complementary to a nucleic acid 
molecule according to any one of claims 8 to 10. 

12. A nucleic acid molecule comprising a nucleotide sequences having 50% or greater sequence 
identity to a nucleic acid molecule according to any one of claims 8-11. 

13. A nucleic acid molecule which can hybridise to a nucleic acid molecule according to any 
10 one of claims 8-12 under high stringency conditions. 

14. A composition comprising a protein, a nucleic acid molecule, or an antibody according to 
any preceding claim. 

15. A composition according to claim 14 being a vaccine composition or a diagnostic 
composition. 

15 16. A composition according to claim 1 4 or claim 1 5 for use as a pharmaceutical. 

1 7. The use of a composition according to claim 14 in the manufacture of a medicament for the 
treatment or prevention of infection due to Neisserial bacteria. 
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